Session log: /save + /sync multi-user change summaries

Enhance /save and /sync slash commands to attribute commits by author so Mike and Howard can see at a glance what the other person did. - sync.sh: loads identity.json, shows incoming/outgoing commits with author + age before pull/push, groups by author in final summary - sync.md: describes the new output format + conflict attribution - save.md: pre-commit Change Summary block + post-commit Summary Motivation: repo is now shared across team, `git log` alone made it hard to see "when did Howard change that?" without hunting.
2026-04-16 19:08:25 -07:00
parent 100a491ac6
commit 6f6a77f8e4
4 changed files with 430 additions and 76 deletions
--- a/session-logs/2026-04-16-session.md
+++ b/session-logs/2026-04-16-session.md
@@ -690,3 +690,284 @@ Claude: [Already has context, proceeds immediately with work]
 **Status:** Automatic context loading system complete and deployed ✅
 **Impact:** Eliminates recurring problem of Claude not knowing previous work
 **Validation:** To be tested in next session with fresh Claude instance
+
+---
+
+## Update: 17:30 UTC — MSP tooling + incident response + remediation skill
+
+### User
+- **User:** Mike Swanson (mike)
+- **Machine:** DESKTOP-0O8A1RL
+- **Role:** admin
+
+### Session summary
+
+Separate session later the same day (different machine/Claude instance from the context-loading work above). Five interleaved threads:
+
+1. **Cascades Tucson breach investigation** — John Trozzi reported as possible credential-stuffing victim. Check found John clean; tenant-wide sweep discovered **Megan Hiatt under active credential-stuffing attack** RIGHT NOW (bursts from Belfast GB, Hamburg DE).
+2. **Built `/remediation-tool` skill + slash command** codifying the M365 investigation workflow.
+3. **Fixed SOPS `vault.sh` on Windows** — Device Guard (WDAC) blocks unsigned `yq.exe`; added Python + PyYAML fallback.
+4. **Valleywide RemoteApp-over-VPN troubleshooting** — walked through `0x3000008` -> NXDOMAIN -> RDS licensing in sequence.
+5. **Howard Enos breach check** — clean, but actively targeted on cloud-admin paths (Azure CLI/LU, AAD PowerShell/DE+JP).
+
+### Thread 1: Cascades Tucson breach investigation
+
+**John Trozzi (`john.trozzi@cascadestucson.com`, `a638f4b9-6936-4401-a9b7-015b9900e49e`)** — tenant `207fa277-e9d8-4eb7-ada1-1064d2221498`.
+
+Verdict: **NO BREACH.** All 10 breach checks clean.
+- No Graph inbox rules; one Exchange hidden rule (`Junk E-mail Rule` — default)
+- No forwarding, no delegates, no non-SELF SendAs
+- 2 OAuth grants (both BlueMail, consented 2022)
+- 5 auth methods all pre-dating attack window (MS Authenticator on Samsung SM-F731U + FIDO2 passkey, both 2026-02-12)
+- 30d sign-ins: 11, 100% from `184.191.143.62` Phoenix AZ (Cox)
+- Directory audits show the legit IR sequence by sysadmin (disable -> password reset -> enable), then John self-changed at 16:04:46 UTC
+
+**Tenant-wide sweep flagged PRIORITY 1: Megan Hiatt (`megan.hiatt@cascadestucson.com`) under active credential-stuffing:**
+- **126 failed sign-ins in 30 days** across 8 IPs / 6 countries (CH, DE, GB, LT, NL, US)
+- **Today (2026-04-16 15:58–16:01 UTC):** 23 failures from `80.94.92.102` (Belfast, GB) via Authenticated SMTP
+- Earlier: 2026-04-15 Hamburg DE (`158.94.211.16`), 2026-04-13 Belfast GB (`80.94.92.123`)
+- Password last changed 2026-02-18 (~2 months stale)
+- Only 1 MFA method (MS Authenticator iPhone 13)
+- Mailbox clean. NOT breached — MFA + IP reputation + account lockout holding.
+- **Action items:** reset Megan's password, disable SMTP AUTH on her mailbox, keep monitoring.
+
+Other notable: external guest `dunedolly21@gmail.com` invited 2026-04-14 by `lauren.hasselman@cascadestucson.com` from her mobile. Lauren's activity is clean. Mike to confirm with Lauren what the invite is for. No meaningful access granted yet.
+
+Gaps encountered and addressed during investigation:
+- Exchange Admin role was not assigned to `ComputerGuru - AI Remediation` SP in Cascades. Mike assigned it via Entra UI. ~15 min to propagate. Unlocked hidden-rule / delegate / SendAs checks.
+- IdentityRiskyUser scope still NOT consented in Cascades. Consent URL opened multiple times but `/servicePrincipals/{id}/appRoleAssignments` shows no new grants today — permission may not be in the app manifest. Mike to verify home-tenant app registration.
+
+**Report:** `clients/cascades-tucson/reports/2026-04-16-john-breach-check.md`
+
+### Thread 2: Built `/remediation-tool` skill
+
+Codified the Cascades workflow into a reusable skill. Files:
+
+```
+.claude/commands/remediation-tool.md
+.claude/skills/remediation-tool/
+├── SKILL.md                          # auto-invocation triggers
+├── scripts/
+│   ├── resolve-tenant.sh             # domain -> tenant GUID via OpenID discovery
+│   ├── get-token.sh                  # Graph + Exchange tokens, 55-min cache
+│   ├── user-breach-check.sh          # 10-point user check
+│   └── tenant-sweep.sh               # tenant-wide signals
+├── references/
+│   ├── gotchas.md                    # role prereqs, consent URLs, display name quirk
+│   ├── graph-endpoints.md            # Graph + Exchange REST cheatsheet
+│   └── checklist.md                  # breach-check rubric
+└── templates/breach-report.md        # report skeleton
+```
+
+Subcommands:
+```
+/remediation-tool check <upn>
+/remediation-tool sweep <domain>
+/remediation-tool signins <domain> [--user upn] [--failed-only] [--days N]
+/remediation-tool consent-url <domain>
+/remediation-tool remediate <upn> <action>       # gated — requires YES in chat
+```
+
+Auth flow: resolve tenant ID from domain via OpenID discovery -> pull secret from SOPS vault -> acquire client-credentials tokens -> run checks -> dump raw JSON to `/tmp/remediation-tool/{tenant}/{check}/` -> write report to `clients/{slug}/reports/YYYY-MM-DD-{action}.md`.
+
+Updated:
+- `.claude/CLAUDE.md` — added `/remediation-tool` row to Commands & Skills table
+- `.claude/memory/feedback_365_remediation_tool.md` — cross-reference to the skill
+
+Smoke-tested end-to-end against Cascades (token acquired, Graph /organization call returned correct tenant) and Howard (full 10-point check in ~5 seconds).
+
+**App identity gotchas captured in references/gotchas.md:**
+- App ID: `fabb3421-8b34-484b-bc17-e46de9703418`
+- Home-tenant name: Claude-MSP-Access
+- **Customer-tenant display name: ComputerGuru - AI Remediation** (important when searching role assignment dialogs)
+- Client secret: vault `msp-tools/claude-msp-access-graph-api.sops.yaml`, field `credentials.credential`
+
+### Thread 3: Vault.sh Device Guard fix
+
+**Root cause:** `yq.exe` on this Windows box is blocked by corporate Device Guard / WDAC policy (unsigned binary). Both the WinGet `Links` shim and the real binary at `C:/Users/guru/AppData/Local/Microsoft/WinGet/Packages/MikeFarah.yq_.../yq.exe` return "Permission denied" / "blocked by your organization's Device Guard policy".
+
+**Fix:** Replaced yq dependency with Python + PyYAML fallback.
+
+Files:
+- **New:** `D:/vault/scripts/yaml-query.py` — minimal yq replacement, two commands (`path <dot.path>`, `flatten-env <key>`)
+- **Modified:** `D:/vault/scripts/vault.sh` — added `_detect_yq_mode`, `_yaml_field`, `_yaml_flatten_env` helpers; replaced two `yq eval` call sites. Prefers yq if it works, falls back to Python.
+
+Verified:
+- `vault.sh get-field msp-tools/claude-msp-access-graph-api.sops.yaml credentials.credential` -> returns `~QJ8Q~NyQSs4OcGqHZyPrA2CVnq9KBfKiimntbMO` ✅
+- `vault.sh export-env ...` -> `CREDENTIAL=~QJ8Q~NyQSs4OcGqHZyPrA2CVnq9KBfKiimntbMO` ✅
+- `vault.sh search`, `vault.sh list` unchanged
+
+PyYAML 6.0.3 already installed at `/c/Program Files/Python314/python`.
+
+**Defender alerts fired** during the fix (rapid SOPS decryption + JWT base64-decoding + client-credentials OAuth looked like credential-dumping heuristics). All false positives. Mike left exclusions unchanged; future runs will hit the 55-min token cache and quiet down.
+
+### Thread 4: Valleywide RemoteApp over VPN (three sequential problems)
+
+**Problem 1 — `0x3000008` (RD Gateway unreachable):** Public WAN :443 port-forward to VWP-QBS was removed during 2026-04-13 brute-force IR. RDP manifest still routed through external FQDN `remote.valleywideplastering.com` -> WAN IP `4.18.160.106` (firewalled).
+
+Fix: Mike removed RD Gateway from RDS deployment on VWP-QBS (Server Manager -> RDS -> Edit Deployment Properties -> RD Gateway -> "Do not use"). New RDP files have `gatewayusagemethod:i:0` and `full address:s:VWP-QBS.VWP.US`.
+
+**Problem 2 — NXDOMAIN for `VWP-QBS.VWP.US`:** After gateway removed, client tried to resolve the session host hostname directly. UDM's static DNS had a typo: `qwp-qbs.vwp.us` (Q not V). `vwp.us` is a real registered domain (website lives publicly), so external DNS doesn't help; internal override needed.
+
+Fix: Mike edited UniFi UI (Settings -> Routing -> DNS -> Static DNS Records), changed `qwp-qbs.vwp.us` -> `vwp-qbs.vwp.us`, still pointing at `172.16.9.169`.
+
+**Problem 3 — "No Remote Desktop License Servers available" (0x3, 0x101):** Once DNS resolved and client reached session host, RDS-Licensing role was installed + activated locally on VWP-QBS but the RDSH was never configured to use it.
+
+Fix (applied remotely via WinRM over VPN from Mike's box):
+```powershell
+$ts = Get-CimInstance -Namespace root\cimv2\TerminalServices -ClassName Win32_TerminalServiceSetting
+Invoke-CimMethod -InputObject $ts -MethodName ChangeMode -Arguments @{LicensingType = 4}   # Per User
+Invoke-CimMethod -InputObject $ts -MethodName SetSpecifiedLicenseServerList -Arguments @{SpecifiedLSList = @('vwp-qbs.vwp.us')}
+```
+Both returned `ReturnValue=0`. Mike confirmed "it works".
+
+**Outstanding VWP issue:** License server has only the Windows 2000-era `Built-in TS Per Device CAL` placeholder — **no real CALs**. Grace period is consumed. Purchase needed: **Windows Server 2022 RDS Per User CAL pack** sized to active user count; install via `licmgr.msc` on VWP-QBS.
+
+### VWP UDM access
+
+- Host: `172.16.9.1` — UniFi Dream Machine Pro, firmware 5.0.16
+- Access: SSH as `root` via ed25519 key (added during this session via PuTTY after UI-based add didn't land)
+- Public key added: `ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINXR2BOcFAlOPuB7OYOKfOZDNd3u1tCt/IINRH9beFyB guru@DESKTOP-0O8A1RL`
+- Fingerprint: `SHA256:ZVbowRHhxPX47eKy9FyMwjvIKPzTf3Dwx3BCsBrP4ds`
+- Vault entry password `Gptf*77ttb123!@#-vwp` does NOT work — needs rotation + vault update
+
+### VWP network topology (discovered)
+
+- LAN: `172.16.9.0/24` (br0 — VWP-QBS at `.169`), `192.168.0.0/24` (br2 — legacy), `192.168.3.0/24` (br99 — iDRAC)
+- WAN: `eth8` = `4.18.160.106/30`
+- OpenVPN server on `tun1` — clients land on `192.168.4.0/24`, DNS pushed = `192.168.4.1` (UDM), routes pushed for all three LAN subnets
+- WireGuard site-to-site peers: `wgsts1001` (192.168.5.2), `wgsts1003` (192.168.5.6), `wgsts1005` (192.168.5.11) — learn OSPF routes for 192.168.1.0/24 and 192.168.10.0/24
+- VPN -> LAN firewall: `UBIOS_VPN_LAN_USER` = ACCEPT all
+- Active port forwards: NONE (DNAT hook empty after 2026-04-13 removal)
+
+`clients/valleywide/README.md` appended with `## 2026-04-16` section documenting all three fixes, topology, and CAL-purchase action item.
+
+### Thread 5: Howard Enos breach check (own tenant)
+
+Invoked `/remediation-tool check howard@azcomputerguru.com`.
+
+- **Tenant:** azcomputerguru.com, `ce61461e-81a0-4c84-bb4a-7b354a9a356d`
+- **UPN:** howard@azcomputerguru.com, object id `c99de3bd-ddc1-43f1-907f-e84b91273660`
+- **Password last changed:** 2024-09-24 (18 months ago)
+
+Verdict: **CLEAN, but actively targeted on cloud-admin paths.**
+
+- **174 of 200 sign-ins non-US in 30d — 100% FAILED, zero successful foreign sign-ins**
+- Top attackers: CN(32), IN(32), KR(28), LU(15 via **Microsoft Azure CLI**), BR(14), DE(8 via **Azure AD PowerShell**), JP(8 via **AAD PowerShell**), plus 19 other countries.
+- Attacker is specifically probing admin-grade endpoints, not just random Exchange.
+- 3 inbox rules — all legit user filters (Telnyx, Atlas_LNP whitelabel, Facebook)
+- 4 OAuth grants — standard Microsoft Graph + Teams
+- 8 app role assignments — all MSP-relevant (Syncro v1+v2, ASUS, Tailscale, Perfect Wiki, KaseyaSSO, Graph Explorer, Uizard)
+- 6 auth methods — password + SMS + OATH + 3x MS Authenticator (phone upgrades)
+
+**Gaps on our own tenant:**
+- Exchange Admin role NOT assigned to ComputerGuru-AI-Remediation SP in azcomputerguru -> blocks hidden-rule / delegate / SendAs checks
+- IdentityRiskyUser NOT consented in azcomputerguru
+
+**Report:** `clients/internal-infrastructure/reports/2026-04-16-howard-breach-check.md`
+
+### Credentials & secrets
+
+**Claude-MSP-Access Graph API app ("ComputerGuru - AI Remediation"):**
+- App ID: `fabb3421-8b34-484b-bc17-e46de9703418`
+- Client Secret: `~QJ8Q~NyQSs4OcGqHZyPrA2CVnq9KBfKiimntbMO`
+- Vault: `msp-tools/claude-msp-access-graph-api.sops.yaml`, field `credentials.credential`
+- Admin consent URL: `https://login.microsoftonline.com/{tenant-id}/adminconsent?client_id=fabb3421-8b34-484b-bc17-e46de9703418&redirect_uri=https://login.microsoftonline.com/common/oauth2/nativeclient`
+
+**VWP (Valleywide):**
+- Domain admin: `vwp\sysadmin` / `r3tr0gradE99#`
+- Vault: `clients/vwp/{adsrvr,dc1,udm,xenserver,quickbooks-server-idrac}.sops.yaml`
+- UDM root (vault says, but broken): `Gptf*77ttb123!@#-vwp` -> ROTATE + update vault
+- UDM SSH: `ssh root@172.16.9.1` via ed25519 key
+- VWP_ADSRVR SSH: `ssh vwp\guru@192.168.0.25` (key from 2026-04-13)
+
+**Cascades Tucson:** Tenant `207fa277-e9d8-4eb7-ada1-1064d2221498`, admin `sysadmin@cascadestucson.com`
+
+**AZ Computer Guru:** Tenant `ce61461e-81a0-4c84-bb4a-7b354a9a356d`
+
+### Files created / modified in this update block
+
+**New:**
+- `.claude/commands/remediation-tool.md`
+- `.claude/skills/remediation-tool/SKILL.md`
+- `.claude/skills/remediation-tool/scripts/{resolve-tenant.sh,get-token.sh,user-breach-check.sh,tenant-sweep.sh}`
+- `.claude/skills/remediation-tool/references/{gotchas.md,graph-endpoints.md,checklist.md}`
+- `.claude/skills/remediation-tool/templates/breach-report.md`
+- `D:/vault/scripts/yaml-query.py`
+- `clients/cascades-tucson/reports/2026-04-16-john-breach-check.md`
+- `clients/internal-infrastructure/reports/2026-04-16-howard-breach-check.md`
+
+**Modified:**
+- `.claude/CLAUDE.md` — added `/remediation-tool` row
+- `.claude/memory/feedback_365_remediation_tool.md` — cross-reference
+- `clients/valleywide/README.md` — 2026-04-16 section (RemoteApp + RDS licensing + CAL TODO)
+- `D:/vault/scripts/vault.sh` — Python fallback for yq
+
+### Pending / incomplete
+
+1. Cascades — **reset Megan's password + disable SMTP AUTH** on her mailbox
+2. Cascades — confirm `dunedolly21@gmail.com` invite with Lauren Hasselman
+3. Cascades — verify IdentityRiskyUser.ReadWrite.All actually in the app manifest; re-run consent URL
+4. Howard — password rotation (18 months old); consider passwordless/FIDO2 primary
+5. Own tenant (azcomputerguru) — assign Exchange Admin role + consent IdentityRiskyUser on ComputerGuru-AI-Remediation SP (oversight)
+6. Own tenant — verify CA policies block legacy auth (attacker hitting basic auth + AAD PowerShell paths)
+7. VWP — purchase Server 2022 RDS Per User CAL pack, install via licmgr.msc
+8. VWP — rotate UDM root password, update vault
+9. VWP — UPnP audit on UDM (carried from 2026-04-13)
+10. VWP — rotate `scanner` AD account password (carried from 2026-04-13)
+
+### Key references
+
+- Skill invocation: `/remediation-tool {check|sweep|signins|consent-url|remediate} <target> [flags]`
+- Raw JSON artifacts: `/tmp/remediation-tool/{tenant-id}/{check}/`
+- Report directory pattern: `clients/{slug}/reports/YYYY-MM-DD-{action}.md`
+- Gotchas: `.claude/skills/remediation-tool/references/gotchas.md`
+- Graph endpoints: `.claude/skills/remediation-tool/references/graph-endpoints.md`
+- Memory: `.claude/memory/feedback_365_remediation_tool.md`
+
+**Update end:** 2026-04-16 ~17:45 UTC
+**Outcome:** Cascades incident triaged (John clean, Megan actively attacked but holding); `/remediation-tool` skill live and tested; vault working on Windows; Valleywide RemoteApp restored; Howard clean but targeted.
+
+---
+
+## Update: 19:00 UTC — /save + /sync multi-user change summaries
+
+### Motivation
+
+The repo is now shared between Mike and Howard (per CLAUDE.md's new multi-user section). When either person pulls `main`, they want to know **what changed and who did it** without re-reading diffs. Mike asked `/save` and `/sync` to surface that automatically.
+
+### Changes
+
+**`.claude/commands/sync.md`** — rewrote to describe the new behavior: pre-pull incoming summary (sha / author / subject / age + `git diff --stat`), pre-push outgoing summary, post-sync totals by author. Conflict-resolution guidance now includes author attribution of each conflicting side.
+
+**`.claude/scripts/sync.sh`** — rewrote. Now:
+- Loads `.claude/identity.json` to pick up current user's full name
+- Commit message replaces the old "Claude Sonnet 4.5 co-author" boilerplate with user + machine attribution
+- Before pulling: prints incoming commits as `sha author subject (ago)` plus a `git diff --stat`
+- Before pushing: prints outgoing commits the same way
+- End-of-run "Sync Summary" counts commits by author on each side
+- Also added the `D:/claudetools` / `/d/claudetools` lowercase variants to the directory-search list (was hitting only TitleCase)
+
+**`.claude/commands/save.md`** — added a pre-commit **Change Summary** block (user + machine + `git status --short` + diff stats) and a post-commit summary (SHA + author + files in commit), with a "why" paragraph about multi-user attribution.
+
+### Design notes
+
+- Author attribution is `%an` from git (the person who made the commit), not the shared push account. Since each user has their own `user.email` + `user.name` set from identity.json during onboarding, `%an` carries the real person.
+- For incoming commits viewed before pull, `%an` works because fetch pulls the commit objects with their original author metadata.
+- Summaries are emitted by the bash script (sync) or by Claude following the command spec (save), not by a git hook. Keeps the behavior visible in normal terminal output when a user runs sync by hand.
+
+### Syntax-checked
+
+`bash -n .claude/scripts/sync.sh` — OK.
+
+### Files touched in this micro-update
+
+- `.claude/commands/sync.md` — rewritten
+- `.claude/scripts/sync.sh` — rewritten
+- `.claude/commands/save.md` — edited (added "After Saving" section)
+
+### Pending from this block
+
+- Actually commit + push everything accumulated in today's session (skill directory, reports, README updates, command updates, this log). Delegated to Gitea agent next.
+
+**Update end:** 2026-04-16 ~19:00 UTC