# Audit Retention Runbook (HIPAA-tier) ACG-side architecture for capturing and retaining 6-year audit logs from customer M365 tenants. First implementation: Cascades Tucson. ## Why this exists HIPAA §164.312(b) requires audit controls; §164.316(b)(2)(i) requires 6-year retention. M365 native retention falls short of 6 years on every relevant log source: | Source | Native | Gap to 6yr | |---|---|---| | Entra sign-in / audit / provisioning logs | 30d | 5y 11m | | Purview Unified Audit Log (Exchange/SP/OD/Teams) | 180d | 5.5y | | Intune audit | 1y | 5y | | Defender alerts | 30d | 5y 11m | We close the gap by exporting via Diagnostic Settings to ACG-owned destinations and supplementing UAL with a poll-based harvester. ## Architecture Hybrid: Log Analytics for live forensics + Storage Account for cold archive. ``` Customer Tenant (Cascades, etc.) Diagnostic Settings ──┬──> [LAW] law--audit (90d interactive) └──> [SA] storaudit (lifecycle: hot 30d -> cool 60d -> archive 6y -> delete) Customer Tenant /v1.0/auditLogs (UAL) ──> ACG Function (poll q4h, per tenant) ──> SA blob path /ual/{yyyy}/{MM}/{dd}/... ``` Both LAW and SA receive the same stream from Diagnostic Settings — one ingest path, two retention tiers. The LAW is for human queries; the SA is for compliance archive. UAL lacks a Diagnostic Settings hook, so we poll the Office 365 Management Activity API on a schedule and write JSON to the same Storage Account. ## Cost model Per HIPAA-tier tenant per month: **~$0.50–1.00** - LAW ingest: ~$2.30/GB × ~0.1 GB/mo = ~$0.23/mo - LAW retention (90d): ~$0.10/GB × peak ~0.3 GB = ~$0.03/mo - Storage Account (cool/archive blended over 6y): ~$0.15/mo - Function compute (shared across tenants): rounded to zero - Egress (only on forensics retrieval): pay-per-use, typically zero/mo ACG cumulative at 5 HIPAA tenants: ~$5–10/mo. Budget headroom for forensics rehydration: ~$50–100 per incident retrieval (one-time). ## Prerequisites ### ACG-side (one-time) - **Azure subscription:** reuse existing `e507e953-2ce9-4887-ba96-9b654f7d3267` — the ACG-owned subscription set up for GuruRMM Trusted Signing (cert profile `gururmm-public-trust` under `gururmm-signing-rg`). Vault entry: `services/azure-trusted-signing.sops.yaml`. - Rationale: Mike already has Owner on this sub; no new billing relationship needed; single tenant boundary; Azure RBAC + RG-level tagging keeps audit data isolated from signing data. - **Existing usage in this sub:** `gururmm-signing-rg` (Trusted Signing for GuruRMM agent binaries). Audit RGs (`rg-audit-*`) will be RG-isolated from signing. - Future split: when we have 3+ HIPAA tenants or a compliance audit requires hard boundary, move audit RGs to a dedicated `acg-msp-compliance` subscription via `az resource move`. - **RBAC for Howard:** Owner at the subscription level — matches the existing operational trust model (Howard has "Full trust — same access as admin" per `CLAUDE.md`). One-time grant unblocks all future MSP-side Azure self-service. Mike runs: ```bash az role assignment create \ --assignee howard.enos@azcomputerguru.com \ --role "Owner" \ --scope "/subscriptions/e507e953-2ce9-4887-ba96-9b654f7d3267" ``` Guardrails to keep Owner-Howard low-risk: - Resource lock on `gururmm-signing-rg`: `az lock create --name signing-protect --lock-type CanNotDelete --resource-group gururmm-signing-rg` - PAYG cost alert at ~$50/mo via Cost Management (UI task) - **Region:** `westus2` for all audit resources. Latency-friendly to Tucson, mature service availability, no HIPAA-relevant cost difference vs other US regions. ### Customer-tenant side (per onboarded HIPAA tenant) - **Tenant Admin SP must have** `Policy.Read.All` (already in updated `onboard-tenant.sh`) - **Tenant Admin SP must have** the directory role **Security Administrator** OR a custom role with `Microsoft.Insights/diagnosticSettings/write` to create Diagnostic Settings on Entra. (Conditional Access Administrator alone does NOT cover Monitor scope.) - **Tenant Admin app manifest must include** `AuditLog.Read.All` and either Graph's `IdentityRiskyUser.Read.All` (already present per `SEC_INV_GRAPH_ROLES`) or follow-on for Defender export ### Tag schema (apply to every resource) ``` client = cascadestucson tier = hipaa service = audit cost-center = msp-audit created-by = howard | mike | onboard-tenant.sh ``` ## Per-tenant onboarding — Cascades example Substitute `` = `cascades` (lowercase, no punctuation, ≤8 chars). Substitute `` = `cascadestucson`. ### Phase 1: ACG-side resource provisioning Howard runs from his workstation with az CLI logged into ACG home tenant: ```bash SUB="e507e953-2ce9-4887-ba96-9b654f7d3267" SHORT="cascades" FULL="cascadestucson" REGION="westus2" RG="rg-audit-${FULL}" az account set --subscription "$SUB" # Resource group az group create --name "$RG" --location "$REGION" \ --tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit created-by=howard # Storage Account (must be globally unique, lowercase alphanumeric, 3-24 chars) SA_NAME="stor${SHORT}audit" az storage account create \ --name "$SA_NAME" \ --resource-group "$RG" \ --location "$REGION" \ --sku Standard_LRS \ --kind StorageV2 \ --access-tier Cool \ --min-tls-version TLS1_2 \ --allow-blob-public-access false \ --tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit # Containers SA_KEY=$(az storage account keys list -g "$RG" -n "$SA_NAME" --query '[0].value' -o tsv) for c in entra-signin entra-audit entra-provisioning intune-audit defender-alerts ual; do az storage container create --name "$c" --account-name "$SA_NAME" --account-key "$SA_KEY" done # Lifecycle policy: hot 30d -> cool 60d -> archive 6y -> delete cat > /tmp/lifecycle.json <<'EOF' { "rules": [{ "name": "hipaa-6y-tier-down", "enabled": true, "type": "Lifecycle", "definition": { "filters": { "blobTypes": ["blockBlob"] }, "actions": { "baseBlob": { "tierToCool": { "daysAfterModificationGreaterThan": 30 }, "tierToArchive": { "daysAfterModificationGreaterThan": 90 }, "delete": { "daysAfterModificationGreaterThan": 2190 } } } } }] } EOF az storage account management-policy create \ --account-name "$SA_NAME" \ --resource-group "$RG" \ --policy @/tmp/lifecycle.json # Immutability (legal hold) — defer until pilot validated. # When ready: az storage container immutability-policy create ... # Log Analytics Workspace LAW_NAME="law-${SHORT}-audit" az monitor log-analytics workspace create \ --resource-group "$RG" \ --workspace-name "$LAW_NAME" \ --location "$REGION" \ --retention-time 90 \ --tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit ``` ### Phase 2: Customer-tenant Diagnostic Settings Performed against Cascades tenant using Tenant Admin token: ```bash CASCADES_TENANT="207fa277-e9d8-4eb7-ada1-1064d2221498" TOKEN=$(bash .claude/skills/remediation-tool/scripts/get-token.sh "$CASCADES_TENANT" tenant-admin) LAW_RESOURCE="/subscriptions/${SUB}/resourceGroups/${RG}/providers/Microsoft.OperationalInsights/workspaces/${LAW_NAME}" SA_RESOURCE="/subscriptions/${SUB}/resourceGroups/${RG}/providers/Microsoft.Storage/storageAccounts/${SA_NAME}" # Entra Diagnostic Settings (covers sign-in + audit + provisioning + non-interactive) curl -X PUT \ -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \ "https://graph.microsoft.com/beta/auditLogs/directoryAudits" \ -d @- </ual/{yyyy}/{MM}/{dd}/{tenantId}/{contentType}-{timestamp}.json` - Deduplicates via `contentId` Codify the design once we've run it manually for a few weeks against Cascades. Estimated build: 4-6 hours dev + test. ## Operational ### Quarterly verification (per tenant, ~10 min) 1. Run a `SigninLogs | summarize count() by bin(TimeGenerated, 1d) | order by TimeGenerated desc` query in LAW. Expect daily volume. 2. Spot-check Storage Account container blob counts and timestamps. 3. Confirm lifecycle policy hasn't drifted: `az storage account management-policy show -g $RG -n $SA_NAME`. 4. Cost: `az consumption usage list --start-date $(date -d '30 days ago' +%Y-%m-%d) --end-date $(date +%Y-%m-%d) --query "[?contains(instanceId,'$SA_NAME')||contains(instanceId,'$LAW_NAME')]"` — should be ~$1/mo per tenant. ### Forensics retrieval - **0–90 days:** KQL on LAW directly. Sub-second queries. - **90 days – 6 years:** rehydrate blob from archive tier. ```bash az storage blob set-tier --tier Hot --rehydrate-priority Standard \ --account-name $SA_NAME --container-name --name ``` Standard rehydrate SLA: ~15 hours. High-priority: ~1 hour, costs ~10x more. ### When to upgrade subscription split - Triggers: 3+ HIPAA tenants, an external compliance audit asking about subscription scope, or one tenant generating >10 GB/month - Path: provision new subscription `acg-msp-compliance`, move RGs via `az resource move`, update Diagnostic Settings destination ARM IDs ## Onboarding integration (codify after pilot validated) Once Cascades is running cleanly for 30 days, fold the per-tenant Phase 1 + Phase 2 into `onboard-tenant.sh` as a flag: ```bash bash onboard-tenant.sh --enable-audit-archive --client-shortname ``` Implementation outline: - Read `--enable-audit-archive` flag - Provision RG + SA + LAW under ACG sub (idempotent: skip if exists) - Issue PUT for Diagnostic Settings against the customer tenant - Append "Audit archive: [OK]" row to the final status table Until codified, Howard runs the runbook manually per tenant. Cascades is the only HIPAA-tier tenant currently — this is fine. ## Open questions / future work - **UAL harvester:** designed but not built. Punt until pilot CA cutover is done. - **Defender for Office 365 export:** does it expose Diagnostic Settings? If not, may need OMA-style poll. Check during Cascades verification. - **MDE alerts:** ditto. - **Sentinel:** the natural upgrade path if alerting becomes important. Cost crosses ~$200/mo at first tenant — defer until justified by an actual operational need. - **Break-glass sign-in alert:** when break-glass admin lands, KQL alert rule on LAW: `SigninLogs | where UserPrincipalName == "breakglass-csc@cascadestucson.com"` → Action Group → email Mike + Howard. Lives in this same LAW.