Session log: Cascades audit retention design + Pro-Tech Services email investigation

Cascades:
- Approved Howard's corrected 4-policy CA bypass design
- Caught + fixed policy 3 GDAP bug (Service provider users exclusion)
- Decided hybrid LAW + Storage Account audit retention (ACG-billed,
  reuse existing Trusted Signing Azure subscription, westus2)
- Wrote full audit retention runbook for Howard
- Reshaped break-glass to two accounts (split-storage YubiKeys)
- Documented Cascades M365 admin model (admin@/sysadmin@ Connect-excluded
  by design; local AD Administrator separate identity layer)
- Decided Howard gets Owner on ACG sub with guardrails (resource lock +
  cost alert) instead of per-RG Contributor

Pro-Tech Services:
- DNS recon of pro-techhelps.com + pro-techservices.co
- Diagnosed calendar invite delivery issue (DKIM domain mismatch +
  no DMARC = strict receivers silently drop invites)
- Drafted non-technical IT-provider migration email to Michelle Sora

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-29 17:05:41 -07:00
parent 6b63c154d2
commit 447b90e092
4 changed files with 706 additions and 0 deletions

View File

@@ -21,6 +21,28 @@ Full contact list + Wi-Fi, KPAX, M365 admin, UniFi hardware MACs, GoDaddy are in
| `svc-audit-upload` | service account for Syncro audit upload to `AuditDrop$` share | `clients/cascades-tucson/svc-audit-upload.sops.yaml` |
| `\\CS-SERVER\homes` | file share at `D:\Homes`; per-user subfolders for folder redirection. Domain Users: Change. Domain Admins: Full. **EncryptData currently false — HIPAA workitem to flip on.** | — |
## M365 admin model
Tenant ID: `207fa277-e9d8-4eb7-ada1-1064d2221498`
Mike's design intent (confirmed 2026-04-29): **the cloud admin layer is fully separated from the on-prem AD admin layer.**
| Account | Layer | Synced via Connect? | Purpose |
|---|---|---|---|
| On-prem AD `Administrator` | On-prem only | No (separate identity layer) | DC + file server admin, GPO, on-prem services. Never authenticates to M365. |
| `admin@cascadestucson.com` | Cloud-only | **No — intentionally Connect-excluded** | Cascades day-to-day cloud GA |
| `sysadmin@cascadestucson.com` | Cloud-only | **No — intentionally Connect-excluded** | Howard's tech account / cloud admin work |
| ACG GDAP partner principals | Foreign principals | N/A | MSP delivery (Mike + Howard from `@azcomputerguru.com`) |
| `breakglass1-csc@cascadestucson.com` | Cloud-only | No (definitionally) | Emergency primary — FIDO2 YubiKey at Cascades sealed envelope |
| `breakglass2-csc@cascadestucson.com` | Cloud-only | No (definitionally) | Emergency secondary — FIDO2 YubiKey at ACG safe |
**When Entra Connect exits staging mode** (Wave 0.5 G3-G5), admin@ and sysadmin@ stay cloud-only — they must remain in the Connect filter exclusion. Verify after every Connect sync rule change.
CA targeting consequences:
- admin@/sysadmin@: subject to all Cascades CA; must be in `SG-External-Signin-Allowed` for off-network admin work
- `SG-Break-Glass`: excluded from all CA (must add exclusion to every new policy)
- ACG GDAP foreign principals: excluded from blocking policies via the "Service provider users" condition (Microsoft's CA UI), NOT via group membership
## GuruRMM
- Client: **Cascades of Tucson** (code `CASC`, id `42e1b0e3-f8b7-4fc5-86bd-06bdbb073b7f`)

View File

@@ -0,0 +1,218 @@
# 2026-04-29 — Audit retention design + handoff to Howard
## User
- **User:** Mike Swanson (mike) (with Claude coordination)
- **Machine:** GURU-BEAST-ROG
- **Role:** admin
- **Session span:** 2026-04-29 ~13:30 PT (post-/sync follow-up to Howard's overnight session)
## Note for Howard
Closing several decisions that were waiting on me from your 2026-04-29 close-out log. **Read these before resuming the pilot build.**
### 1. CA bypass design — APPROVED with two corrections to policy 3
Your corrected 4-policy design (DELETE existing all-users-MFA + 4 new policies) is approved in shape. Logic checks out: caregiver on Cascades + compliant = password only, off-network = blocked, non-compliant on-site = MFA. **Two corrections to policy 3 user targeting before you build it:**
**Correction 1 — admin off-site access:** `admin@cascadestucson.com` and `sysadmin@cascadestucson.com` need to be in `SG-External-Signin-Allowed` so policy 3 doesn't block you from working from home. (You probably already have this in mind, but make it an explicit checkbox.)
**Correction 2 — GDAP partner access (the bug I caught):** Policy 3 as written ("All users" minus `SG-External-Signin-Allowed`) blocks ACG GDAP partner admins because Microsoft's "All users" *includes* service-provider foreign principals unless explicitly excluded. You and I signing into Cascades from `@azcomputerguru.com` would lose remote MSP access at cutover.
**Revised policy 3 user targeting:**
- Include: All users
- Exclude: `SG-External-Signin-Allowed` + `SG-Break-Glass` + **Service provider users** (the B2B/GDAP exclusion in CA's user picker)
Verify the exact CA UI label — it's the "External users" subselection where "Service provider users" appears as a checkbox. If Microsoft's renamed it again, the goal is "exclude foreign principals coming in via GDAP."
**Other build-time notes still valid:**
- `SG-Break-Glass` group must exist (with breakglass1 + breakglass2 — see §4) before you build any of the 4 policies. Excluding a non-existent group is silent failure.
- Stage in Report-only first, watch logs for 24-48h, watch for any sign-in from `@azcomputerguru.com` foreign principals being blocked — that's the canary on correction 2.
Bootstrap chicken-and-egg concern I raised earlier is retracted (on-network filter excuses policy 1 + 3, policy 2 prompts MFA which admin can satisfy → compliance flips → caregivers clean flow on subsequent sign-ins). Operational rule: phones don't get handed to caregivers until SDM bootstrap is done.
### 2. Audit retention — DECIDED: hybrid LAW + Storage, ACG-billed, on existing sub
Full design and runbook lives at:
**`.claude/skills/remediation-tool/references/audit-retention-runbook.md`**
Read that before you start. Summary:
- **Architecture:** hybrid. LAW for 90-day live forensics + Storage Account for 6-year cold archive (lifecycle: hot 30d → cool 60d → archive 6y → delete). Both fed by the same Diagnostic Settings export — single ingest, two retention tiers.
- **Subscription:** reuse the existing `e507e953-2ce9-4887-ba96-9b654f7d3267` (the GuruRMM Trusted Signing sub — Mike already has Owner). RG-isolated from the signing RG. Vault: `services/azure-trusted-signing.sops.yaml`.
- **Region:** `westus2`
- **Cost:** ~$0.501.00/mo per HIPAA-tier tenant. ACG-billed, bundled into HIPAA-tier MRR.
- **UAL handling:** poll-based harvester (Office 365 Management Activity API → Storage Account blobs). DEFERRED — design only, build after pilot CA cutover. Punt for now; M365 native 180-day UAL retention covers us short-term.
- **Codify path:** once Cascades runs cleanly for 30 days, fold Phase 1 + Phase 2 into `onboard-tenant.sh` as `--enable-audit-archive`.
**RBAC needed before you can start:** Mike needs to grant you Contributor on `rg-audit-cascadestucson` once the RG exists. The az command for that is in the runbook prereqs. Mike: that's on you to run when Howard creates the RG.
**Sequence (per the runbook):**
1. ACG-side resource provisioning (RG, Storage Account, lifecycle policy, LAW) — Howard runs az CLI
2. Customer-tenant Diagnostic Settings (Entra → both destinations) — Tenant Admin token + ARM endpoint
3. Verification (1h after, query LAW, check SA blobs)
4. Defender / Intune Diagnostic Settings — discover during verification, add as available
5. UAL harvester — DEFERRED
**Caveat in the runbook:** the cURL in Phase 2 is conceptual. Entra Diagnostic Settings actually go through ARM (`management.azure.com/providers/microsoft.aadiam/diagnosticSettings/...`), not Graph. Validate the working endpoint during your dry-run. Tenant Admin SP probably needs Security Administrator directory role (or a custom role with `Microsoft.AzureActiveDirectory/diagnosticSettings/write`) on top of CA Admin to create Diagnostic Settings on Entra. Worth confirming early — if true, add to the next `onboard-tenant.sh` patch.
### 3. Backfill sweep — APPROVED, scheduling
Patched `onboard-tenant.sh` will be re-run against all 6 ACG customer tenants tonight (21:00 PT) so they all get the CA Admin role + `Policy.Read.All` backfill: bg-builders, cascades-tucson (idempotent — Howard's PIM-set role will trigger Conflict-fallback, that's fine), cw-concrete, dataforth, heieck-org, mvan. Mike to /schedule the agent or run manually depending on availability tonight.
**Known noise:** the `role_assigned` helper queries legacy `roleAssignments` and may report MISSING for PIM-managed assignments. Script handles it correctly via Conflict-fallback. Cosmetic only. TODO to teach `role_assigned` about `roleAssignmentSchedules` is tracked but not blocking.
### 4. Break-glass admin — design APPROVED (revised: TWO accounts)
Mike's question on the existing admin@ + sysadmin@ + GDAP partner access prompted a reshape. Break-glass is complementary to those, not redundant — see §6 for the four-paths rationale. Net change from my earlier "1 break-glass + 1 YubiKey" recommendation: **two break-glass accounts, two YubiKeys, split physical storage.**
**Accounts:**
| | Primary | Secondary |
|---|---|---|
| UPN | `breakglass1-csc@cascadestucson.com` | `breakglass2-csc@cascadestucson.com` |
| Type | Cloud-only Global Admin, no license | Cloud-only Global Admin, no license |
| FIDO2 | YubiKey #1 | YubiKey #2 |
| Physical storage | Cascades on-site, sealed envelope, sign-out audit | ACG office safe (Mike) |
| Vault entry | `clients/cascades-tucson/breakglass1.sops.yaml` | `clients/cascades-tucson/breakglass2.sops.yaml` |
Both accounts:
- Cloud-only (NOT synced from on-prem) — survives Entra Connect breaks
- Excluded from password expiration policy
- 32-char randomly generated password (separate per account, both vaulted)
- Member of `SG-Break-Glass` group, which is excluded from CA policies 1, 2, 3 and any future CA
- Sign-in alerts via LAW KQL alert rule:
```kql
SigninLogs
| where UserPrincipalName in ("breakglass1-csc@cascadestucson.com", "breakglass2-csc@cascadestucson.com")
| where ResultType == 0 // success only — for any-attempt alert, drop this
```
Action Group → email Mike + Howard immediately. Builds after the LAW lands as part of audit retention.
- Quarterly test: sign in with each, verify FIDO2 works, verify password still valid. Calendar both for the same day.
**Storage split rationale:** isolates failure domains — office fire, ACG compromise, individual rogue actor all need both keys to hit zero recovery, and they're physically separated. Microsoft's official guidance is exactly this two-account pattern.
**Cost:** $50 for two YubiKeys (was $25 for one). Trivial.
**Build order:**
1. Create both accounts (Cloud-only, license-free, GA role)
2. Generate + vault both 32-char passwords
3. Create `SG-Break-Glass` group, add both members
4. Register YubiKey #1 to breakglass1, YubiKey #2 to breakglass2 (both at the workstation, then physically separate)
5. Test sign-in with both before exiting the build session
6. Seal YubiKey #1 in envelope, store at Cascades; YubiKey #2 to ACG safe
7. Verify CA exclusion of `SG-Break-Glass` group is in place on every existing CA policy BEFORE building the new 4-policy design
When we have 3-4 HIPAA tenants on this pattern, codify into `onboard-tenant.sh` as `--enable-breakglass` (creates both accounts + group + alert rule per tenant).
### 5. ALIS BAA + Risk Analysis doc — your action items unchanged
These are still on your to-do (email Meredith for ALIS BAA; draft Risk Analysis doc via Ollama for Mike's review). Independent of CA + audit retention. Do whenever.
### 6. Four admin paths into Cascades — failure-domain map
Mike clarified the admin account model: **`admin@` and `sysadmin@` are intentionally excluded from Entra Connect sync — they live cloud-only.** The local AD `Administrator` is a separate on-prem identity, not part of the M365 admin model.
Final picture:
| Account | Lives | Synced? | CA posture |
|---|---|---|---|
| On-prem AD `Administrator` | Cascades AD only | No — never authenticates to M365 | N/A (out of scope here; managed via on-prem AD security separately) |
| `admin@cascadestucson.com` | Cloud-only (Connect-excluded by design) | No, intentional | Subject to CA + must be in `SG-External-Signin-Allowed` |
| `sysadmin@cascadestucson.com` | Cloud-only (Connect-excluded by design) | No, intentional | Subject to CA + must be in `SG-External-Signin-Allowed` |
| ACG GDAP partner (Mike + Howard from `@azcomputerguru.com`) | Foreign principal | N/A | Subject to Cascades CA — must be excluded from policy 3 via "Service provider users" |
| `breakglass1-csc@` / `breakglass2-csc@` | Cloud-only (definitionally) | No | `SG-Break-Glass` group excluded from all CA |
Failure modes each path covers:
| Path | Failure modes it covers |
|---|---|
| `admin@` / `sysadmin@` | Day-to-day Cascades-side admin work; immune to Entra Connect issues (excluded from sync) |
| ACG GDAP partner | Day-to-day MSP delivery; no dependency on Cascades-side identities |
| Break-glass (×2) | CA misconfig (the live risk during cutover next week), accidental disable/lockout of admin@/sysadmin@, ACG compromise, GDAP relationship revoke |
**Why break-glass is still needed even though admin@/sysadmin@ are already cloud-only:** the differentiator is **CA exclusion** and **ACG-independence**, not Connect immunity. During CA cutover, a misconfigured policy can lock out admin@ + sysadmin@ + GDAP foreign principals simultaneously (especially under the policy 3 GDAP bug). Break-glass is the only path that survives all three.
The on-prem AD `Administrator` account is "its own thing" per Mike — managed via on-prem AD security practices, not part of this design. If/when HIPAA on-prem hardening lands (privileged access workstation pattern, LAPS, audit, MFA-via-smartcard, etc.), that's a separate Wave.
### 7. Audit retention build dependencies
Your CA pilot cutover does NOT block on audit retention being live. Pilot phase is on Report-only and only flipping a small set of policies for `SG-Caregivers-Pilot`. Audit retention is HIPAA-tier infra and primarily matters before **Phase 3 (production rollout to all caregivers)** and before any real ePHI events flow through the system.
**Order of operations from here:**
1. (You) Finish pilot Outlook + LinkRx/Helpany apps + first phone enrollment + compliance flip
2. (You) Create pilot user + cloud group `SG-Caregivers-Pilot`
3. (You) Stand up audit retention (runbook ref above) — can interleave with #1-2 while waiting on app sync delays
4. (You) Build break-glass admin (depends on audit retention LAW for sign-in alerts)
5. (You) Stage 4 new CA policies in Report-only, assigned to `SG-Caregivers-Pilot` only
6. (You) 24-48h Report-only review, fix gaps
7. (You) Flip CA to On
8. (You) Run pilot validation tests
9. (You) Phase 3 production rollout (after AD prereq cleanup + Entra Connect staging exit)
Audit retention slots in around #3 because it doesn't block anything and the LAW has to exist before #4. If Microsoft's Entra Diagnostic Settings endpoint fights you, fall back to manual Azure portal setup for now (we can codify the API later) and don't let it block CA progress.
---
## Configuration Decisions Made
- **CA bypass design:** APPROVED as written by Howard (no changes)
- **Audit retention architecture:** APPROVED hybrid LAW + Storage, ACG-billed
- **Audit retention subscription:** reuse `e507e953-2ce9-4887-ba96-9b654f7d3267` (GuruRMM Trusted Signing sub) with isolated `rg-audit-*` resource groups
- **Audit retention region:** `westus2`
- **Audit retention RBAC:** Mike = Owner (sub level), Howard = Contributor (per-RG scope on `rg-audit-*`)
- **Backfill sweep:** APPROVED, Howard runs at his convenience (6 tenants, idempotent, ~5 min)
- **Break-glass count:** TWO accounts (Microsoft official guidance — primary at Cascades, secondary at ACG)
- **YubiKey order:** $50 (TWO YubiKeys, split storage Cascades/ACG)
- **Break-glass naming:** `breakglass1-csc@cascadestucson.com` and `breakglass2-csc@cascadestucson.com`
- **Policy 3 user targeting:** revised to also exclude Service provider users (GDAP foreign principal exclusion — bug fix)
---
## Files Created / Modified
- **NEW:** `.claude/skills/remediation-tool/references/audit-retention-runbook.md` — full design + per-tenant runbook, including conceptual cURL for Entra Diagnostic Settings (Howard validates exact endpoint during dry-run)
- This session log
---
## Pending / Incomplete (handoff)
### Mike's outstanding items
- [ ] Order TWO YubiKeys ($50 total, Amazon — one to Cascades sealed envelope via Howard, one to ACG safe)
- [ ] **One-time:** Grant Howard Owner on the ACG subscription so he can self-serve all future MSP-side Azure work (audit retention RGs and beyond):
```bash
az role assignment create \
--assignee howard.enos@azcomputerguru.com \
--role "Owner" \
--scope "/subscriptions/e507e953-2ce9-4887-ba96-9b654f7d3267"
```
- [ ] **One-time guardrails to make Owner-Howard low-risk:**
- Resource lock on `gururmm-signing-rg` (CanNotDelete) so the signing infra can't be accidentally removed:
```bash
az lock create --name signing-protect --lock-type CanNotDelete \
--resource-group gururmm-signing-rg \
--notes "Protect GuruRMM Trusted Signing infra from accidental deletion"
```
- PAYG cost alert at ~$50/mo total via Cost Management (Azure portal, 5-min UI task)
- [x] Document in `clients/cascades-tucson/CONTEXT.md` that admin@ and sysadmin@ are intentionally Connect-excluded (cloud-only by design). Done — added "M365 admin model" section.
### Howard's items
- See "Order of operations" above. Audit retention is the new #3 — slots in around the pilot phone enrollment work without blocking it.
- **Backfill sweep against the 6 ACG tenants.** Run `bash .claude/skills/remediation-tool/scripts/onboard-tenant.sh <tenant-id>` against bg-builders, cascades-tucson, cw-concrete, dataforth, heieck-org, mvan to apply the new CA Admin role + Policy.Read.All backfill. Idempotent — safe to re-run. Cascades will be noisy (PIM-managed role triggers Conflict-fallback path); that's expected. Recommend running this BEFORE starting audit retention since (a) it warms up against the patched script, (b) it confirms the baseline directory roles + Graph perms are in place across all tenants, which is a prereq for any future audit-retention codification. ~5 minutes total. Drop a one-liner session log when done.
### Tracked TODOs (not blocking)
- [ ] Teach `role_assigned` helper about `roleAssignmentSchedules` (cosmetic noise only)
- [ ] Build OMA Activity API harvester (4-6 hours dev when ready, after Cascades audit retention runs cleanly for 30d)
- [ ] Codify `onboard-tenant.sh --enable-audit-archive` flag (after pilot validated)
- [ ] Codify `onboard-tenant.sh --enable-breakglass` flag (after 3-4 HIPAA tenants on pattern)
---
## Reference
- Runbook: `.claude/skills/remediation-tool/references/audit-retention-runbook.md`
- Howard's prior session log: `clients/cascades-tucson/session-logs/2026-04-29-howard-cascades-bypass-pilot-phase-b-buildout.md`
- ACG Azure subscription: `e507e953-2ce9-4887-ba96-9b654f7d3267` (vault: `services/azure-trusted-signing.sops.yaml`)
- ACG Trusted Signing RG (existing, separate from audit): `gururmm-signing-rg`
- Cascades tenant ID: `207fa277-e9d8-4eb7-ada1-1064d2221498`