# 2026-04-29 — Audit retention design + handoff to Howard ## User - **User:** Mike Swanson (mike) (with Claude coordination) - **Machine:** GURU-BEAST-ROG - **Role:** admin - **Session span:** 2026-04-29 ~13:30 PT (post-/sync follow-up to Howard's overnight session) ## Note for Howard Closing several decisions that were waiting on me from your 2026-04-29 close-out log. **Read these before resuming the pilot build.** ### 1. CA bypass design — APPROVED with two corrections to policy 3 Your corrected 4-policy design (DELETE existing all-users-MFA + 4 new policies) is approved in shape. Logic checks out: caregiver on Cascades + compliant = password only, off-network = blocked, non-compliant on-site = MFA. **Two corrections to policy 3 user targeting before you build it:** **Correction 1 — admin off-site access:** `admin@cascadestucson.com` and `sysadmin@cascadestucson.com` need to be in `SG-External-Signin-Allowed` so policy 3 doesn't block you from working from home. (You probably already have this in mind, but make it an explicit checkbox.) **Correction 2 — GDAP partner access (the bug I caught):** Policy 3 as written ("All users" minus `SG-External-Signin-Allowed`) blocks ACG GDAP partner admins because Microsoft's "All users" *includes* service-provider foreign principals unless explicitly excluded. You and I signing into Cascades from `@azcomputerguru.com` would lose remote MSP access at cutover. **Revised policy 3 user targeting:** - Include: All users - Exclude: `SG-External-Signin-Allowed` + `SG-Break-Glass` + **Service provider users** (the B2B/GDAP exclusion in CA's user picker) Verify the exact CA UI label — it's the "External users" subselection where "Service provider users" appears as a checkbox. If Microsoft's renamed it again, the goal is "exclude foreign principals coming in via GDAP." **Other build-time notes still valid:** - `SG-Break-Glass` group must exist (with breakglass1 + breakglass2 — see §4) before you build any of the 4 policies. Excluding a non-existent group is silent failure. - Stage in Report-only first, watch logs for 24-48h, watch for any sign-in from `@azcomputerguru.com` foreign principals being blocked — that's the canary on correction 2. Bootstrap chicken-and-egg concern I raised earlier is retracted (on-network filter excuses policy 1 + 3, policy 2 prompts MFA which admin can satisfy → compliance flips → caregivers clean flow on subsequent sign-ins). Operational rule: phones don't get handed to caregivers until SDM bootstrap is done. ### 2. Audit retention — DECIDED: hybrid LAW + Storage, ACG-billed, on existing sub Full design and runbook lives at: **`.claude/skills/remediation-tool/references/audit-retention-runbook.md`** Read that before you start. Summary: - **Architecture:** hybrid. LAW for 90-day live forensics + Storage Account for 6-year cold archive (lifecycle: hot 30d → cool 60d → archive 6y → delete). Both fed by the same Diagnostic Settings export — single ingest, two retention tiers. - **Subscription:** reuse the existing `e507e953-2ce9-4887-ba96-9b654f7d3267` (the GuruRMM Trusted Signing sub — Mike already has Owner). RG-isolated from the signing RG. Vault: `services/azure-trusted-signing.sops.yaml`. - **Region:** `westus2` - **Cost:** ~$0.50–1.00/mo per HIPAA-tier tenant. ACG-billed, bundled into HIPAA-tier MRR. - **UAL handling:** poll-based harvester (Office 365 Management Activity API → Storage Account blobs). DEFERRED — design only, build after pilot CA cutover. Punt for now; M365 native 180-day UAL retention covers us short-term. - **Codify path:** once Cascades runs cleanly for 30 days, fold Phase 1 + Phase 2 into `onboard-tenant.sh` as `--enable-audit-archive`. **RBAC needed before you can start:** Mike needs to grant you Contributor on `rg-audit-cascadestucson` once the RG exists. The az command for that is in the runbook prereqs. Mike: that's on you to run when Howard creates the RG. **Sequence (per the runbook):** 1. ACG-side resource provisioning (RG, Storage Account, lifecycle policy, LAW) — Howard runs az CLI 2. Customer-tenant Diagnostic Settings (Entra → both destinations) — Tenant Admin token + ARM endpoint 3. Verification (1h after, query LAW, check SA blobs) 4. Defender / Intune Diagnostic Settings — discover during verification, add as available 5. UAL harvester — DEFERRED **Caveat in the runbook:** the cURL in Phase 2 is conceptual. Entra Diagnostic Settings actually go through ARM (`management.azure.com/providers/microsoft.aadiam/diagnosticSettings/...`), not Graph. Validate the working endpoint during your dry-run. Tenant Admin SP probably needs Security Administrator directory role (or a custom role with `Microsoft.AzureActiveDirectory/diagnosticSettings/write`) on top of CA Admin to create Diagnostic Settings on Entra. Worth confirming early — if true, add to the next `onboard-tenant.sh` patch. ### 3. Backfill sweep — APPROVED, scheduling Patched `onboard-tenant.sh` will be re-run against all 6 ACG customer tenants tonight (21:00 PT) so they all get the CA Admin role + `Policy.Read.All` backfill: bg-builders, cascades-tucson (idempotent — Howard's PIM-set role will trigger Conflict-fallback, that's fine), cw-concrete, dataforth, heieck-org, mvan. Mike to /schedule the agent or run manually depending on availability tonight. **Known noise:** the `role_assigned` helper queries legacy `roleAssignments` and may report MISSING for PIM-managed assignments. Script handles it correctly via Conflict-fallback. Cosmetic only. TODO to teach `role_assigned` about `roleAssignmentSchedules` is tracked but not blocking. ### 4. Break-glass admin — design APPROVED (revised: TWO accounts) Mike's question on the existing admin@ + sysadmin@ + GDAP partner access prompted a reshape. Break-glass is complementary to those, not redundant — see §6 for the four-paths rationale. Net change from my earlier "1 break-glass + 1 YubiKey" recommendation: **two break-glass accounts, two YubiKeys, split physical storage.** **Accounts:** | | Primary | Secondary | |---|---|---| | UPN | `breakglass1-csc@cascadestucson.com` | `breakglass2-csc@cascadestucson.com` | | Type | Cloud-only Global Admin, no license | Cloud-only Global Admin, no license | | FIDO2 | YubiKey #1 | YubiKey #2 | | Physical storage | Cascades on-site, sealed envelope, sign-out audit | ACG office safe (Mike) | | Vault entry | `clients/cascades-tucson/breakglass1.sops.yaml` | `clients/cascades-tucson/breakglass2.sops.yaml` | Both accounts: - Cloud-only (NOT synced from on-prem) — survives Entra Connect breaks - Excluded from password expiration policy - 32-char randomly generated password (separate per account, both vaulted) - Member of `SG-Break-Glass` group, which is excluded from CA policies 1, 2, 3 and any future CA - Sign-in alerts via LAW KQL alert rule: ```kql SigninLogs | where UserPrincipalName in ("breakglass1-csc@cascadestucson.com", "breakglass2-csc@cascadestucson.com") | where ResultType == 0 // success only — for any-attempt alert, drop this ``` Action Group → email Mike + Howard immediately. Builds after the LAW lands as part of audit retention. - Quarterly test: sign in with each, verify FIDO2 works, verify password still valid. Calendar both for the same day. **Storage split rationale:** isolates failure domains — office fire, ACG compromise, individual rogue actor all need both keys to hit zero recovery, and they're physically separated. Microsoft's official guidance is exactly this two-account pattern. **Cost:** $50 for two YubiKeys (was $25 for one). Trivial. **Build order:** 1. Create both accounts (Cloud-only, license-free, GA role) 2. Generate + vault both 32-char passwords 3. Create `SG-Break-Glass` group, add both members 4. Register YubiKey #1 to breakglass1, YubiKey #2 to breakglass2 (both at the workstation, then physically separate) 5. Test sign-in with both before exiting the build session 6. Seal YubiKey #1 in envelope, store at Cascades; YubiKey #2 to ACG safe 7. Verify CA exclusion of `SG-Break-Glass` group is in place on every existing CA policy BEFORE building the new 4-policy design When we have 3-4 HIPAA tenants on this pattern, codify into `onboard-tenant.sh` as `--enable-breakglass` (creates both accounts + group + alert rule per tenant). ### 5. ALIS BAA + Risk Analysis doc — your action items unchanged These are still on your to-do (email Meredith for ALIS BAA; draft Risk Analysis doc via Ollama for Mike's review). Independent of CA + audit retention. Do whenever. ### 6. Four admin paths into Cascades — failure-domain map Mike clarified the admin account model: **`admin@` and `sysadmin@` are intentionally excluded from Entra Connect sync — they live cloud-only.** The local AD `Administrator` is a separate on-prem identity, not part of the M365 admin model. Final picture: | Account | Lives | Synced? | CA posture | |---|---|---|---| | On-prem AD `Administrator` | Cascades AD only | No — never authenticates to M365 | N/A (out of scope here; managed via on-prem AD security separately) | | `admin@cascadestucson.com` | Cloud-only (Connect-excluded by design) | No, intentional | Subject to CA + must be in `SG-External-Signin-Allowed` | | `sysadmin@cascadestucson.com` | Cloud-only (Connect-excluded by design) | No, intentional | Subject to CA + must be in `SG-External-Signin-Allowed` | | ACG GDAP partner (Mike + Howard from `@azcomputerguru.com`) | Foreign principal | N/A | Subject to Cascades CA — must be excluded from policy 3 via "Service provider users" | | `breakglass1-csc@` / `breakglass2-csc@` | Cloud-only (definitionally) | No | `SG-Break-Glass` group excluded from all CA | Failure modes each path covers: | Path | Failure modes it covers | |---|---| | `admin@` / `sysadmin@` | Day-to-day Cascades-side admin work; immune to Entra Connect issues (excluded from sync) | | ACG GDAP partner | Day-to-day MSP delivery; no dependency on Cascades-side identities | | Break-glass (×2) | CA misconfig (the live risk during cutover next week), accidental disable/lockout of admin@/sysadmin@, ACG compromise, GDAP relationship revoke | **Why break-glass is still needed even though admin@/sysadmin@ are already cloud-only:** the differentiator is **CA exclusion** and **ACG-independence**, not Connect immunity. During CA cutover, a misconfigured policy can lock out admin@ + sysadmin@ + GDAP foreign principals simultaneously (especially under the policy 3 GDAP bug). Break-glass is the only path that survives all three. The on-prem AD `Administrator` account is "its own thing" per Mike — managed via on-prem AD security practices, not part of this design. If/when HIPAA on-prem hardening lands (privileged access workstation pattern, LAPS, audit, MFA-via-smartcard, etc.), that's a separate Wave. ### 7. Audit retention build dependencies Your CA pilot cutover does NOT block on audit retention being live. Pilot phase is on Report-only and only flipping a small set of policies for `SG-Caregivers-Pilot`. Audit retention is HIPAA-tier infra and primarily matters before **Phase 3 (production rollout to all caregivers)** and before any real ePHI events flow through the system. **Order of operations from here:** 1. (You) Finish pilot Outlook + LinkRx/Helpany apps + first phone enrollment + compliance flip 2. (You) Create pilot user + cloud group `SG-Caregivers-Pilot` 3. (You) Stand up audit retention (runbook ref above) — can interleave with #1-2 while waiting on app sync delays 4. (You) Build break-glass admin (depends on audit retention LAW for sign-in alerts) 5. (You) Stage 4 new CA policies in Report-only, assigned to `SG-Caregivers-Pilot` only 6. (You) 24-48h Report-only review, fix gaps 7. (You) Flip CA to On 8. (You) Run pilot validation tests 9. (You) Phase 3 production rollout (after AD prereq cleanup + Entra Connect staging exit) Audit retention slots in around #3 because it doesn't block anything and the LAW has to exist before #4. If Microsoft's Entra Diagnostic Settings endpoint fights you, fall back to manual Azure portal setup for now (we can codify the API later) and don't let it block CA progress. --- ## Configuration Decisions Made - **CA bypass design:** APPROVED as written by Howard (no changes) - **Audit retention architecture:** APPROVED hybrid LAW + Storage, ACG-billed - **Audit retention subscription:** reuse `e507e953-2ce9-4887-ba96-9b654f7d3267` (GuruRMM Trusted Signing sub) with isolated `rg-audit-*` resource groups - **Audit retention region:** `westus2` - **Audit retention RBAC:** Mike = Owner (sub level), Howard = Contributor (per-RG scope on `rg-audit-*`) - **Backfill sweep:** APPROVED, Howard runs at his convenience (6 tenants, idempotent, ~5 min) - **Break-glass count:** TWO accounts (Microsoft official guidance — primary at Cascades, secondary at ACG) - **YubiKey order:** $50 (TWO YubiKeys, split storage Cascades/ACG) - **Break-glass naming:** `breakglass1-csc@cascadestucson.com` and `breakglass2-csc@cascadestucson.com` - **Policy 3 user targeting:** revised to also exclude Service provider users (GDAP foreign principal exclusion — bug fix) --- ## Files Created / Modified - **NEW:** `.claude/skills/remediation-tool/references/audit-retention-runbook.md` — full design + per-tenant runbook, including conceptual cURL for Entra Diagnostic Settings (Howard validates exact endpoint during dry-run) - This session log --- ## Pending / Incomplete (handoff) ### Mike's outstanding items - [ ] Order TWO YubiKeys ($50 total, Amazon — one to Cascades sealed envelope via Howard, one to ACG safe) - [ ] **One-time:** Grant Howard Owner on the ACG subscription so he can self-serve all future MSP-side Azure work (audit retention RGs and beyond): ```bash az role assignment create \ --assignee howard.enos@azcomputerguru.com \ --role "Owner" \ --scope "/subscriptions/e507e953-2ce9-4887-ba96-9b654f7d3267" ``` - [ ] **One-time guardrails to make Owner-Howard low-risk:** - Resource lock on `gururmm-signing-rg` (CanNotDelete) so the signing infra can't be accidentally removed: ```bash az lock create --name signing-protect --lock-type CanNotDelete \ --resource-group gururmm-signing-rg \ --notes "Protect GuruRMM Trusted Signing infra from accidental deletion" ``` - PAYG cost alert at ~$50/mo total via Cost Management (Azure portal, 5-min UI task) - [x] Document in `clients/cascades-tucson/CONTEXT.md` that admin@ and sysadmin@ are intentionally Connect-excluded (cloud-only by design). Done — added "M365 admin model" section. ### Howard's items - See "Order of operations" above. Audit retention is the new #3 — slots in around the pilot phone enrollment work without blocking it. - **Backfill sweep against the 6 ACG tenants.** Run `bash .claude/skills/remediation-tool/scripts/onboard-tenant.sh ` against bg-builders, cascades-tucson, cw-concrete, dataforth, heieck-org, mvan to apply the new CA Admin role + Policy.Read.All backfill. Idempotent — safe to re-run. Cascades will be noisy (PIM-managed role triggers Conflict-fallback path); that's expected. Recommend running this BEFORE starting audit retention since (a) it warms up against the patched script, (b) it confirms the baseline directory roles + Graph perms are in place across all tenants, which is a prereq for any future audit-retention codification. ~5 minutes total. Drop a one-liner session log when done. ### Tracked TODOs (not blocking) - [ ] Teach `role_assigned` helper about `roleAssignmentSchedules` (cosmetic noise only) - [ ] Build OMA Activity API harvester (4-6 hours dev when ready, after Cascades audit retention runs cleanly for 30d) - [ ] Codify `onboard-tenant.sh --enable-audit-archive` flag (after pilot validated) - [ ] Codify `onboard-tenant.sh --enable-breakglass` flag (after 3-4 HIPAA tenants on pattern) --- ## Reference - Runbook: `.claude/skills/remediation-tool/references/audit-retention-runbook.md` - Howard's prior session log: `clients/cascades-tucson/session-logs/2026-04-29-howard-cascades-bypass-pilot-phase-b-buildout.md` - ACG Azure subscription: `e507e953-2ce9-4887-ba96-9b654f7d3267` (vault: `services/azure-trusted-signing.sops.yaml`) - ACG Trusted Signing RG (existing, separate from audit): `gururmm-signing-rg` - Cascades tenant ID: `207fa277-e9d8-4eb7-ada1-1064d2221498`