Files
claudetools/clients/cascades-tucson/session-logs/2026-04-29-mike-audit-retention-design-and-handoff.md
Mike Swanson 447b90e092 Session log: Cascades audit retention design + Pro-Tech Services email investigation
Cascades:
- Approved Howard's corrected 4-policy CA bypass design
- Caught + fixed policy 3 GDAP bug (Service provider users exclusion)
- Decided hybrid LAW + Storage Account audit retention (ACG-billed,
  reuse existing Trusted Signing Azure subscription, westus2)
- Wrote full audit retention runbook for Howard
- Reshaped break-glass to two accounts (split-storage YubiKeys)
- Documented Cascades M365 admin model (admin@/sysadmin@ Connect-excluded
  by design; local AD Administrator separate identity layer)
- Decided Howard gets Owner on ACG sub with guardrails (resource lock +
  cost alert) instead of per-RG Contributor

Pro-Tech Services:
- DNS recon of pro-techhelps.com + pro-techservices.co
- Diagnosed calendar invite delivery issue (DKIM domain mismatch +
  no DMARC = strict receivers silently drop invites)
- Drafted non-technical IT-provider migration email to Michelle Sora

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:05:41 -07:00

16 KiB
Raw Blame History

2026-04-29 — Audit retention design + handoff to Howard

User

  • User: Mike Swanson (mike) (with Claude coordination)
  • Machine: GURU-BEAST-ROG
  • Role: admin
  • Session span: 2026-04-29 ~13:30 PT (post-/sync follow-up to Howard's overnight session)

Note for Howard

Closing several decisions that were waiting on me from your 2026-04-29 close-out log. Read these before resuming the pilot build.

1. CA bypass design — APPROVED with two corrections to policy 3

Your corrected 4-policy design (DELETE existing all-users-MFA + 4 new policies) is approved in shape. Logic checks out: caregiver on Cascades + compliant = password only, off-network = blocked, non-compliant on-site = MFA. Two corrections to policy 3 user targeting before you build it:

Correction 1 — admin off-site access: admin@cascadestucson.com and sysadmin@cascadestucson.com need to be in SG-External-Signin-Allowed so policy 3 doesn't block you from working from home. (You probably already have this in mind, but make it an explicit checkbox.)

Correction 2 — GDAP partner access (the bug I caught): Policy 3 as written ("All users" minus SG-External-Signin-Allowed) blocks ACG GDAP partner admins because Microsoft's "All users" includes service-provider foreign principals unless explicitly excluded. You and I signing into Cascades from @azcomputerguru.com would lose remote MSP access at cutover.

Revised policy 3 user targeting:

  • Include: All users
  • Exclude: SG-External-Signin-Allowed + SG-Break-Glass + Service provider users (the B2B/GDAP exclusion in CA's user picker)

Verify the exact CA UI label — it's the "External users" subselection where "Service provider users" appears as a checkbox. If Microsoft's renamed it again, the goal is "exclude foreign principals coming in via GDAP."

Other build-time notes still valid:

  • SG-Break-Glass group must exist (with breakglass1 + breakglass2 — see §4) before you build any of the 4 policies. Excluding a non-existent group is silent failure.
  • Stage in Report-only first, watch logs for 24-48h, watch for any sign-in from @azcomputerguru.com foreign principals being blocked — that's the canary on correction 2.

Bootstrap chicken-and-egg concern I raised earlier is retracted (on-network filter excuses policy 1 + 3, policy 2 prompts MFA which admin can satisfy → compliance flips → caregivers clean flow on subsequent sign-ins). Operational rule: phones don't get handed to caregivers until SDM bootstrap is done.

2. Audit retention — DECIDED: hybrid LAW + Storage, ACG-billed, on existing sub

Full design and runbook lives at:

.claude/skills/remediation-tool/references/audit-retention-runbook.md

Read that before you start. Summary:

  • Architecture: hybrid. LAW for 90-day live forensics + Storage Account for 6-year cold archive (lifecycle: hot 30d → cool 60d → archive 6y → delete). Both fed by the same Diagnostic Settings export — single ingest, two retention tiers.
  • Subscription: reuse the existing e507e953-2ce9-4887-ba96-9b654f7d3267 (the GuruRMM Trusted Signing sub — Mike already has Owner). RG-isolated from the signing RG. Vault: services/azure-trusted-signing.sops.yaml.
  • Region: westus2
  • Cost: ~$0.501.00/mo per HIPAA-tier tenant. ACG-billed, bundled into HIPAA-tier MRR.
  • UAL handling: poll-based harvester (Office 365 Management Activity API → Storage Account blobs). DEFERRED — design only, build after pilot CA cutover. Punt for now; M365 native 180-day UAL retention covers us short-term.
  • Codify path: once Cascades runs cleanly for 30 days, fold Phase 1 + Phase 2 into onboard-tenant.sh as --enable-audit-archive.

RBAC needed before you can start: Mike needs to grant you Contributor on rg-audit-cascadestucson once the RG exists. The az command for that is in the runbook prereqs. Mike: that's on you to run when Howard creates the RG.

Sequence (per the runbook):

  1. ACG-side resource provisioning (RG, Storage Account, lifecycle policy, LAW) — Howard runs az CLI
  2. Customer-tenant Diagnostic Settings (Entra → both destinations) — Tenant Admin token + ARM endpoint
  3. Verification (1h after, query LAW, check SA blobs)
  4. Defender / Intune Diagnostic Settings — discover during verification, add as available
  5. UAL harvester — DEFERRED

Caveat in the runbook: the cURL in Phase 2 is conceptual. Entra Diagnostic Settings actually go through ARM (management.azure.com/providers/microsoft.aadiam/diagnosticSettings/...), not Graph. Validate the working endpoint during your dry-run. Tenant Admin SP probably needs Security Administrator directory role (or a custom role with Microsoft.AzureActiveDirectory/diagnosticSettings/write) on top of CA Admin to create Diagnostic Settings on Entra. Worth confirming early — if true, add to the next onboard-tenant.sh patch.

3. Backfill sweep — APPROVED, scheduling

Patched onboard-tenant.sh will be re-run against all 6 ACG customer tenants tonight (21:00 PT) so they all get the CA Admin role + Policy.Read.All backfill: bg-builders, cascades-tucson (idempotent — Howard's PIM-set role will trigger Conflict-fallback, that's fine), cw-concrete, dataforth, heieck-org, mvan. Mike to /schedule the agent or run manually depending on availability tonight.

Known noise: the role_assigned helper queries legacy roleAssignments and may report MISSING for PIM-managed assignments. Script handles it correctly via Conflict-fallback. Cosmetic only. TODO to teach role_assigned about roleAssignmentSchedules is tracked but not blocking.

4. Break-glass admin — design APPROVED (revised: TWO accounts)

Mike's question on the existing admin@ + sysadmin@ + GDAP partner access prompted a reshape. Break-glass is complementary to those, not redundant — see §6 for the four-paths rationale. Net change from my earlier "1 break-glass + 1 YubiKey" recommendation: two break-glass accounts, two YubiKeys, split physical storage.

Accounts:

Primary Secondary
UPN breakglass1-csc@cascadestucson.com breakglass2-csc@cascadestucson.com
Type Cloud-only Global Admin, no license Cloud-only Global Admin, no license
FIDO2 YubiKey #1 YubiKey #2
Physical storage Cascades on-site, sealed envelope, sign-out audit ACG office safe (Mike)
Vault entry clients/cascades-tucson/breakglass1.sops.yaml clients/cascades-tucson/breakglass2.sops.yaml

Both accounts:

  • Cloud-only (NOT synced from on-prem) — survives Entra Connect breaks
  • Excluded from password expiration policy
  • 32-char randomly generated password (separate per account, both vaulted)
  • Member of SG-Break-Glass group, which is excluded from CA policies 1, 2, 3 and any future CA
  • Sign-in alerts via LAW KQL alert rule:
    SigninLogs
    | where UserPrincipalName in ("breakglass1-csc@cascadestucson.com", "breakglass2-csc@cascadestucson.com")
    | where ResultType == 0  // success only — for any-attempt alert, drop this
    
    Action Group → email Mike + Howard immediately. Builds after the LAW lands as part of audit retention.
  • Quarterly test: sign in with each, verify FIDO2 works, verify password still valid. Calendar both for the same day.

Storage split rationale: isolates failure domains — office fire, ACG compromise, individual rogue actor all need both keys to hit zero recovery, and they're physically separated. Microsoft's official guidance is exactly this two-account pattern.

Cost: $50 for two YubiKeys (was $25 for one). Trivial.

Build order:

  1. Create both accounts (Cloud-only, license-free, GA role)
  2. Generate + vault both 32-char passwords
  3. Create SG-Break-Glass group, add both members
  4. Register YubiKey #1 to breakglass1, YubiKey #2 to breakglass2 (both at the workstation, then physically separate)
  5. Test sign-in with both before exiting the build session
  6. Seal YubiKey #1 in envelope, store at Cascades; YubiKey #2 to ACG safe
  7. Verify CA exclusion of SG-Break-Glass group is in place on every existing CA policy BEFORE building the new 4-policy design

When we have 3-4 HIPAA tenants on this pattern, codify into onboard-tenant.sh as --enable-breakglass (creates both accounts + group + alert rule per tenant).

5. ALIS BAA + Risk Analysis doc — your action items unchanged

These are still on your to-do (email Meredith for ALIS BAA; draft Risk Analysis doc via Ollama for Mike's review). Independent of CA + audit retention. Do whenever.

6. Four admin paths into Cascades — failure-domain map

Mike clarified the admin account model: admin@ and sysadmin@ are intentionally excluded from Entra Connect sync — they live cloud-only. The local AD Administrator is a separate on-prem identity, not part of the M365 admin model.

Final picture:

Account Lives Synced? CA posture
On-prem AD Administrator Cascades AD only No — never authenticates to M365 N/A (out of scope here; managed via on-prem AD security separately)
admin@cascadestucson.com Cloud-only (Connect-excluded by design) No, intentional Subject to CA + must be in SG-External-Signin-Allowed
sysadmin@cascadestucson.com Cloud-only (Connect-excluded by design) No, intentional Subject to CA + must be in SG-External-Signin-Allowed
ACG GDAP partner (Mike + Howard from @azcomputerguru.com) Foreign principal N/A Subject to Cascades CA — must be excluded from policy 3 via "Service provider users"
breakglass1-csc@ / breakglass2-csc@ Cloud-only (definitionally) No SG-Break-Glass group excluded from all CA

Failure modes each path covers:

Path Failure modes it covers
admin@ / sysadmin@ Day-to-day Cascades-side admin work; immune to Entra Connect issues (excluded from sync)
ACG GDAP partner Day-to-day MSP delivery; no dependency on Cascades-side identities
Break-glass (×2) CA misconfig (the live risk during cutover next week), accidental disable/lockout of admin@/sysadmin@, ACG compromise, GDAP relationship revoke

Why break-glass is still needed even though admin@/sysadmin@ are already cloud-only: the differentiator is CA exclusion and ACG-independence, not Connect immunity. During CA cutover, a misconfigured policy can lock out admin@ + sysadmin@ + GDAP foreign principals simultaneously (especially under the policy 3 GDAP bug). Break-glass is the only path that survives all three.

The on-prem AD Administrator account is "its own thing" per Mike — managed via on-prem AD security practices, not part of this design. If/when HIPAA on-prem hardening lands (privileged access workstation pattern, LAPS, audit, MFA-via-smartcard, etc.), that's a separate Wave.

7. Audit retention build dependencies

Your CA pilot cutover does NOT block on audit retention being live. Pilot phase is on Report-only and only flipping a small set of policies for SG-Caregivers-Pilot. Audit retention is HIPAA-tier infra and primarily matters before Phase 3 (production rollout to all caregivers) and before any real ePHI events flow through the system.

Order of operations from here:

  1. (You) Finish pilot Outlook + LinkRx/Helpany apps + first phone enrollment + compliance flip
  2. (You) Create pilot user + cloud group SG-Caregivers-Pilot
  3. (You) Stand up audit retention (runbook ref above) — can interleave with #1-2 while waiting on app sync delays
  4. (You) Build break-glass admin (depends on audit retention LAW for sign-in alerts)
  5. (You) Stage 4 new CA policies in Report-only, assigned to SG-Caregivers-Pilot only
  6. (You) 24-48h Report-only review, fix gaps
  7. (You) Flip CA to On
  8. (You) Run pilot validation tests
  9. (You) Phase 3 production rollout (after AD prereq cleanup + Entra Connect staging exit)

Audit retention slots in around #3 because it doesn't block anything and the LAW has to exist before #4. If Microsoft's Entra Diagnostic Settings endpoint fights you, fall back to manual Azure portal setup for now (we can codify the API later) and don't let it block CA progress.


Configuration Decisions Made

  • CA bypass design: APPROVED as written by Howard (no changes)
  • Audit retention architecture: APPROVED hybrid LAW + Storage, ACG-billed
  • Audit retention subscription: reuse e507e953-2ce9-4887-ba96-9b654f7d3267 (GuruRMM Trusted Signing sub) with isolated rg-audit-* resource groups
  • Audit retention region: westus2
  • Audit retention RBAC: Mike = Owner (sub level), Howard = Contributor (per-RG scope on rg-audit-*)
  • Backfill sweep: APPROVED, Howard runs at his convenience (6 tenants, idempotent, ~5 min)
  • Break-glass count: TWO accounts (Microsoft official guidance — primary at Cascades, secondary at ACG)
  • YubiKey order: $50 (TWO YubiKeys, split storage Cascades/ACG)
  • Break-glass naming: breakglass1-csc@cascadestucson.com and breakglass2-csc@cascadestucson.com
  • Policy 3 user targeting: revised to also exclude Service provider users (GDAP foreign principal exclusion — bug fix)

Files Created / Modified

  • NEW: .claude/skills/remediation-tool/references/audit-retention-runbook.md — full design + per-tenant runbook, including conceptual cURL for Entra Diagnostic Settings (Howard validates exact endpoint during dry-run)
  • This session log

Pending / Incomplete (handoff)

Mike's outstanding items

  • Order TWO YubiKeys ($50 total, Amazon — one to Cascades sealed envelope via Howard, one to ACG safe)
  • One-time: Grant Howard Owner on the ACG subscription so he can self-serve all future MSP-side Azure work (audit retention RGs and beyond):
    az role assignment create \
      --assignee howard.enos@azcomputerguru.com \
      --role "Owner" \
      --scope "/subscriptions/e507e953-2ce9-4887-ba96-9b654f7d3267"
    
  • One-time guardrails to make Owner-Howard low-risk:
    • Resource lock on gururmm-signing-rg (CanNotDelete) so the signing infra can't be accidentally removed:
      az lock create --name signing-protect --lock-type CanNotDelete \
        --resource-group gururmm-signing-rg \
        --notes "Protect GuruRMM Trusted Signing infra from accidental deletion"
      
    • PAYG cost alert at ~$50/mo total via Cost Management (Azure portal, 5-min UI task)
  • Document in clients/cascades-tucson/CONTEXT.md that admin@ and sysadmin@ are intentionally Connect-excluded (cloud-only by design). Done — added "M365 admin model" section.

Howard's items

  • See "Order of operations" above. Audit retention is the new #3 — slots in around the pilot phone enrollment work without blocking it.
  • Backfill sweep against the 6 ACG tenants. Run bash .claude/skills/remediation-tool/scripts/onboard-tenant.sh <tenant-id> against bg-builders, cascades-tucson, cw-concrete, dataforth, heieck-org, mvan to apply the new CA Admin role + Policy.Read.All backfill. Idempotent — safe to re-run. Cascades will be noisy (PIM-managed role triggers Conflict-fallback path); that's expected. Recommend running this BEFORE starting audit retention since (a) it warms up against the patched script, (b) it confirms the baseline directory roles + Graph perms are in place across all tenants, which is a prereq for any future audit-retention codification. ~5 minutes total. Drop a one-liner session log when done.

Tracked TODOs (not blocking)

  • Teach role_assigned helper about roleAssignmentSchedules (cosmetic noise only)
  • Build OMA Activity API harvester (4-6 hours dev when ready, after Cascades audit retention runs cleanly for 30d)
  • Codify onboard-tenant.sh --enable-audit-archive flag (after pilot validated)
  • Codify onboard-tenant.sh --enable-breakglass flag (after 3-4 HIPAA tenants on pattern)

Reference

  • Runbook: .claude/skills/remediation-tool/references/audit-retention-runbook.md
  • Howard's prior session log: clients/cascades-tucson/session-logs/2026-04-29-howard-cascades-bypass-pilot-phase-b-buildout.md
  • ACG Azure subscription: e507e953-2ce9-4887-ba96-9b654f7d3267 (vault: services/azure-trusted-signing.sops.yaml)
  • ACG Trusted Signing RG (existing, separate from audit): gururmm-signing-rg
  • Cascades tenant ID: 207fa277-e9d8-4eb7-ada1-1064d2221498