Cascades: - Approved Howard's corrected 4-policy CA bypass design - Caught + fixed policy 3 GDAP bug (Service provider users exclusion) - Decided hybrid LAW + Storage Account audit retention (ACG-billed, reuse existing Trusted Signing Azure subscription, westus2) - Wrote full audit retention runbook for Howard - Reshaped break-glass to two accounts (split-storage YubiKeys) - Documented Cascades M365 admin model (admin@/sysadmin@ Connect-excluded by design; local AD Administrator separate identity layer) - Decided Howard gets Owner on ACG sub with guardrails (resource lock + cost alert) instead of per-RG Contributor Pro-Tech Services: - DNS recon of pro-techhelps.com + pro-techservices.co - Diagnosed calendar invite delivery issue (DKIM domain mismatch + no DMARC = strict receivers silently drop invites) - Drafted non-technical IT-provider migration email to Michelle Sora Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 KiB
Audit Retention Runbook (HIPAA-tier)
ACG-side architecture for capturing and retaining 6-year audit logs from customer M365 tenants. First implementation: Cascades Tucson.
Why this exists
HIPAA §164.312(b) requires audit controls; §164.316(b)(2)(i) requires 6-year retention.
M365 native retention falls short of 6 years on every relevant log source:
| Source | Native | Gap to 6yr |
|---|---|---|
| Entra sign-in / audit / provisioning logs | 30d | 5y 11m |
| Purview Unified Audit Log (Exchange/SP/OD/Teams) | 180d | 5.5y |
| Intune audit | 1y | 5y |
| Defender alerts | 30d | 5y 11m |
We close the gap by exporting via Diagnostic Settings to ACG-owned destinations and supplementing UAL with a poll-based harvester.
Architecture
Hybrid: Log Analytics for live forensics + Storage Account for cold archive.
Customer Tenant (Cascades, etc.)
Diagnostic Settings ──┬──> [LAW] law-<short>-audit (90d interactive)
└──> [SA] stor<short>audit (lifecycle: hot 30d -> cool 60d -> archive 6y -> delete)
Customer Tenant
/v1.0/auditLogs (UAL) ──> ACG Function (poll q4h, per tenant) ──> SA blob path /ual/{yyyy}/{MM}/{dd}/...
Both LAW and SA receive the same stream from Diagnostic Settings — one ingest path, two retention tiers. The LAW is for human queries; the SA is for compliance archive.
UAL lacks a Diagnostic Settings hook, so we poll the Office 365 Management Activity API on a schedule and write JSON to the same Storage Account.
Cost model
Per HIPAA-tier tenant per month: ~$0.50–1.00
- LAW ingest: ~$2.30/GB × ~0.1 GB/mo = ~$0.23/mo
- LAW retention (90d): ~$0.10/GB × peak ~0.3 GB = ~$0.03/mo
- Storage Account (cool/archive blended over 6y): ~$0.15/mo
- Function compute (shared across tenants): rounded to zero
- Egress (only on forensics retrieval): pay-per-use, typically zero/mo
ACG cumulative at 5 HIPAA tenants: ~$5–10/mo. Budget headroom for forensics rehydration: ~$50–100 per incident retrieval (one-time).
Prerequisites
ACG-side (one-time)
- Azure subscription: reuse existing
e507e953-2ce9-4887-ba96-9b654f7d3267— the ACG-owned subscription set up for GuruRMM Trusted Signing (cert profilegururmm-public-trustundergururmm-signing-rg). Vault entry:services/azure-trusted-signing.sops.yaml.- Rationale: Mike already has Owner on this sub; no new billing relationship needed; single tenant boundary; Azure RBAC + RG-level tagging keeps audit data isolated from signing data.
- Existing usage in this sub:
gururmm-signing-rg(Trusted Signing for GuruRMM agent binaries). Audit RGs (rg-audit-*) will be RG-isolated from signing. - Future split: when we have 3+ HIPAA tenants or a compliance audit requires hard boundary, move audit RGs to a dedicated
acg-msp-compliancesubscription viaaz resource move.
- RBAC for Howard: Owner at the subscription level — matches the existing operational trust model (Howard has "Full trust — same access as admin" per
CLAUDE.md). One-time grant unblocks all future MSP-side Azure self-service. Mike runs:Guardrails to keep Owner-Howard low-risk:az role assignment create \ --assignee howard.enos@azcomputerguru.com \ --role "Owner" \ --scope "/subscriptions/e507e953-2ce9-4887-ba96-9b654f7d3267"- Resource lock on
gururmm-signing-rg:az lock create --name signing-protect --lock-type CanNotDelete --resource-group gururmm-signing-rg - PAYG cost alert at ~$50/mo via Cost Management (UI task)
- Resource lock on
- Region:
westus2for all audit resources. Latency-friendly to Tucson, mature service availability, no HIPAA-relevant cost difference vs other US regions.
Customer-tenant side (per onboarded HIPAA tenant)
- Tenant Admin SP must have
Policy.Read.All(already in updatedonboard-tenant.sh) - Tenant Admin SP must have the directory role Security Administrator OR a custom role with
Microsoft.Insights/diagnosticSettings/writeto create Diagnostic Settings on Entra. (Conditional Access Administrator alone does NOT cover Monitor scope.) - Tenant Admin app manifest must include
AuditLog.Read.Alland either Graph'sIdentityRiskyUser.Read.All(already present perSEC_INV_GRAPH_ROLES) or follow-on for Defender export
Tag schema (apply to every resource)
client = cascadestucson
tier = hipaa
service = audit
cost-center = msp-audit
created-by = howard | mike | onboard-tenant.sh
Per-tenant onboarding — Cascades example
Substitute <short> = cascades (lowercase, no punctuation, ≤8 chars). Substitute <full> = cascadestucson.
Phase 1: ACG-side resource provisioning
Howard runs from his workstation with az CLI logged into ACG home tenant:
SUB="e507e953-2ce9-4887-ba96-9b654f7d3267"
SHORT="cascades"
FULL="cascadestucson"
REGION="westus2"
RG="rg-audit-${FULL}"
az account set --subscription "$SUB"
# Resource group
az group create --name "$RG" --location "$REGION" \
--tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit created-by=howard
# Storage Account (must be globally unique, lowercase alphanumeric, 3-24 chars)
SA_NAME="stor${SHORT}audit"
az storage account create \
--name "$SA_NAME" \
--resource-group "$RG" \
--location "$REGION" \
--sku Standard_LRS \
--kind StorageV2 \
--access-tier Cool \
--min-tls-version TLS1_2 \
--allow-blob-public-access false \
--tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit
# Containers
SA_KEY=$(az storage account keys list -g "$RG" -n "$SA_NAME" --query '[0].value' -o tsv)
for c in entra-signin entra-audit entra-provisioning intune-audit defender-alerts ual; do
az storage container create --name "$c" --account-name "$SA_NAME" --account-key "$SA_KEY"
done
# Lifecycle policy: hot 30d -> cool 60d -> archive 6y -> delete
cat > /tmp/lifecycle.json <<'EOF'
{
"rules": [{
"name": "hipaa-6y-tier-down",
"enabled": true,
"type": "Lifecycle",
"definition": {
"filters": { "blobTypes": ["blockBlob"] },
"actions": {
"baseBlob": {
"tierToCool": { "daysAfterModificationGreaterThan": 30 },
"tierToArchive": { "daysAfterModificationGreaterThan": 90 },
"delete": { "daysAfterModificationGreaterThan": 2190 }
}
}
}
}]
}
EOF
az storage account management-policy create \
--account-name "$SA_NAME" \
--resource-group "$RG" \
--policy @/tmp/lifecycle.json
# Immutability (legal hold) — defer until pilot validated.
# When ready: az storage container immutability-policy create ...
# Log Analytics Workspace
LAW_NAME="law-${SHORT}-audit"
az monitor log-analytics workspace create \
--resource-group "$RG" \
--workspace-name "$LAW_NAME" \
--location "$REGION" \
--retention-time 90 \
--tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit
Phase 2: Customer-tenant Diagnostic Settings
Performed against Cascades tenant using Tenant Admin token:
CASCADES_TENANT="207fa277-e9d8-4eb7-ada1-1064d2221498"
TOKEN=$(bash .claude/skills/remediation-tool/scripts/get-token.sh "$CASCADES_TENANT" tenant-admin)
LAW_RESOURCE="/subscriptions/${SUB}/resourceGroups/${RG}/providers/Microsoft.OperationalInsights/workspaces/${LAW_NAME}"
SA_RESOURCE="/subscriptions/${SUB}/resourceGroups/${RG}/providers/Microsoft.Storage/storageAccounts/${SA_NAME}"
# Entra Diagnostic Settings (covers sign-in + audit + provisioning + non-interactive)
curl -X PUT \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
"https://graph.microsoft.com/beta/auditLogs/directoryAudits" \
-d @- <<EOF
{
"name": "acg-audit-export",
"logs": [
{"category": "AuditLogs", "enabled": true},
{"category": "SignInLogs", "enabled": true},
{"category": "NonInteractiveUserSignInLogs", "enabled": true},
{"category": "ServicePrincipalSignInLogs", "enabled": true},
{"category": "ManagedIdentitySignInLogs", "enabled": true},
{"category": "ProvisioningLogs", "enabled": true},
{"category": "ADFSSignInLogs", "enabled": true},
{"category": "RiskyUsers", "enabled": true},
{"category": "UserRiskEvents", "enabled": true}
],
"workspaceId": "${LAW_RESOURCE}",
"storageAccountId": "${SA_RESOURCE}"
}
EOF
Note: Entra Diagnostic Settings actually go through Azure Resource Manager (not Graph), and the proper endpoint is:
PUT https://management.azure.com/providers/microsoft.aadiam/diagnosticSettings/{name}?api-version=2017-04-01-preview
Authenticate against ARM (https://management.azure.com), not Graph. The Tenant Admin SP needs Microsoft.AzureActiveDirectory/diagnosticSettings/write permission, granted via the Security Administrator directory role. Howard: validate the working endpoint during dry-run; the cURL above is the conceptual shape, not the exact call.
Phase 3: Verification (1h after setup)
# Query LAW for recent sign-ins
az monitor log-analytics query \
--workspace "$LAW_NAME" \
--resource-group "$RG" \
--analytics-query "SigninLogs | take 5 | project TimeGenerated, UserPrincipalName, ResultType"
# Confirm Storage Account is receiving blobs
az storage blob list --container-name insights-logs-signinlogs \
--account-name "$SA_NAME" --account-key "$SA_KEY" --num-results 5
If LAW returns rows and SA has blobs, the export is live.
Phase 4: UAL harvester (deferred — separate buildout)
UAL has no Diagnostic Settings export. Approach when we get to it:
- Azure Function (Python or PowerShell), timer trigger every 4h
- Per onboarded tenant: managed identity granted
ActivityFeed.Readagainst Office 365 Management API - Polls
/api/v1.0/{tenantId}/activity/feed/subscriptions/content?contentType=Audit.AzureActiveDirectory|Audit.Exchange|Audit.SharePoint|Audit.General|DLP.All - Writes raw JSON to
<sa>/ual/{yyyy}/{MM}/{dd}/{tenantId}/{contentType}-{timestamp}.json - Deduplicates via
contentId
Codify the design once we've run it manually for a few weeks against Cascades. Estimated build: 4-6 hours dev + test.
Operational
Quarterly verification (per tenant, ~10 min)
- Run a
SigninLogs | summarize count() by bin(TimeGenerated, 1d) | order by TimeGenerated descquery in LAW. Expect daily volume. - Spot-check Storage Account container blob counts and timestamps.
- Confirm lifecycle policy hasn't drifted:
az storage account management-policy show -g $RG -n $SA_NAME. - Cost:
az consumption usage list --start-date $(date -d '30 days ago' +%Y-%m-%d) --end-date $(date +%Y-%m-%d) --query "[?contains(instanceId,'$SA_NAME')||contains(instanceId,'$LAW_NAME')]"— should be ~$1/mo per tenant.
Forensics retrieval
- 0–90 days: KQL on LAW directly. Sub-second queries.
- 90 days – 6 years: rehydrate blob from archive tier.
Standard rehydrate SLA: ~15 hours. High-priority: ~1 hour, costs ~10x more.
az storage blob set-tier --tier Hot --rehydrate-priority Standard \ --account-name $SA_NAME --container-name <c> --name <blob>
When to upgrade subscription split
- Triggers: 3+ HIPAA tenants, an external compliance audit asking about subscription scope, or one tenant generating >10 GB/month
- Path: provision new subscription
acg-msp-compliance, move RGs viaaz resource move, update Diagnostic Settings destination ARM IDs
Onboarding integration (codify after pilot validated)
Once Cascades is running cleanly for 30 days, fold the per-tenant Phase 1 + Phase 2 into onboard-tenant.sh as a flag:
bash onboard-tenant.sh <tenant-id> --enable-audit-archive --client-shortname <short>
Implementation outline:
- Read
--enable-audit-archiveflag - Provision RG + SA + LAW under ACG sub (idempotent: skip if exists)
- Issue PUT for Diagnostic Settings against the customer tenant
- Append "Audit archive: [OK]" row to the final status table
Until codified, Howard runs the runbook manually per tenant. Cascades is the only HIPAA-tier tenant currently — this is fine.
Open questions / future work
- UAL harvester: designed but not built. Punt until pilot CA cutover is done.
- Defender for Office 365 export: does it expose Diagnostic Settings? If not, may need OMA-style poll. Check during Cascades verification.
- MDE alerts: ditto.
- Sentinel: the natural upgrade path if alerting becomes important. Cost crosses ~$200/mo at first tenant — defer until justified by an actual operational need.
- Break-glass sign-in alert: when break-glass admin lands, KQL alert rule on LAW:
SigninLogs | where UserPrincipalName == "breakglass-csc@cascadestucson.com"→ Action Group → email Mike + Howard. Lives in this same LAW.