Session log: Cascades audit retention design + Pro-Tech Services email investigation
Cascades: - Approved Howard's corrected 4-policy CA bypass design - Caught + fixed policy 3 GDAP bug (Service provider users exclusion) - Decided hybrid LAW + Storage Account audit retention (ACG-billed, reuse existing Trusted Signing Azure subscription, westus2) - Wrote full audit retention runbook for Howard - Reshaped break-glass to two accounts (split-storage YubiKeys) - Documented Cascades M365 admin model (admin@/sysadmin@ Connect-excluded by design; local AD Administrator separate identity layer) - Decided Howard gets Owner on ACG sub with guardrails (resource lock + cost alert) instead of per-RG Contributor Pro-Tech Services: - DNS recon of pro-techhelps.com + pro-techservices.co - Diagnosed calendar invite delivery issue (DKIM domain mismatch + no DMARC = strict receivers silently drop invites) - Drafted non-technical IT-provider migration email to Michelle Sora Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,280 @@
|
||||
# Audit Retention Runbook (HIPAA-tier)
|
||||
|
||||
ACG-side architecture for capturing and retaining 6-year audit logs from customer M365 tenants. First implementation: Cascades Tucson.
|
||||
|
||||
## Why this exists
|
||||
|
||||
HIPAA §164.312(b) requires audit controls; §164.316(b)(2)(i) requires 6-year retention.
|
||||
|
||||
M365 native retention falls short of 6 years on every relevant log source:
|
||||
|
||||
| Source | Native | Gap to 6yr |
|
||||
|---|---|---|
|
||||
| Entra sign-in / audit / provisioning logs | 30d | 5y 11m |
|
||||
| Purview Unified Audit Log (Exchange/SP/OD/Teams) | 180d | 5.5y |
|
||||
| Intune audit | 1y | 5y |
|
||||
| Defender alerts | 30d | 5y 11m |
|
||||
|
||||
We close the gap by exporting via Diagnostic Settings to ACG-owned destinations and supplementing UAL with a poll-based harvester.
|
||||
|
||||
## Architecture
|
||||
|
||||
Hybrid: Log Analytics for live forensics + Storage Account for cold archive.
|
||||
|
||||
```
|
||||
Customer Tenant (Cascades, etc.)
|
||||
Diagnostic Settings ──┬──> [LAW] law-<short>-audit (90d interactive)
|
||||
└──> [SA] stor<short>audit (lifecycle: hot 30d -> cool 60d -> archive 6y -> delete)
|
||||
|
||||
Customer Tenant
|
||||
/v1.0/auditLogs (UAL) ──> ACG Function (poll q4h, per tenant) ──> SA blob path /ual/{yyyy}/{MM}/{dd}/...
|
||||
```
|
||||
|
||||
Both LAW and SA receive the same stream from Diagnostic Settings — one ingest path, two retention tiers. The LAW is for human queries; the SA is for compliance archive.
|
||||
|
||||
UAL lacks a Diagnostic Settings hook, so we poll the Office 365 Management Activity API on a schedule and write JSON to the same Storage Account.
|
||||
|
||||
## Cost model
|
||||
|
||||
Per HIPAA-tier tenant per month: **~$0.50–1.00**
|
||||
|
||||
- LAW ingest: ~$2.30/GB × ~0.1 GB/mo = ~$0.23/mo
|
||||
- LAW retention (90d): ~$0.10/GB × peak ~0.3 GB = ~$0.03/mo
|
||||
- Storage Account (cool/archive blended over 6y): ~$0.15/mo
|
||||
- Function compute (shared across tenants): rounded to zero
|
||||
- Egress (only on forensics retrieval): pay-per-use, typically zero/mo
|
||||
|
||||
ACG cumulative at 5 HIPAA tenants: ~$5–10/mo. Budget headroom for forensics rehydration: ~$50–100 per incident retrieval (one-time).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### ACG-side (one-time)
|
||||
|
||||
- **Azure subscription:** reuse existing `e507e953-2ce9-4887-ba96-9b654f7d3267` — the ACG-owned subscription set up for GuruRMM Trusted Signing (cert profile `gururmm-public-trust` under `gururmm-signing-rg`). Vault entry: `services/azure-trusted-signing.sops.yaml`.
|
||||
- Rationale: Mike already has Owner on this sub; no new billing relationship needed; single tenant boundary; Azure RBAC + RG-level tagging keeps audit data isolated from signing data.
|
||||
- **Existing usage in this sub:** `gururmm-signing-rg` (Trusted Signing for GuruRMM agent binaries). Audit RGs (`rg-audit-*`) will be RG-isolated from signing.
|
||||
- Future split: when we have 3+ HIPAA tenants or a compliance audit requires hard boundary, move audit RGs to a dedicated `acg-msp-compliance` subscription via `az resource move`.
|
||||
- **RBAC for Howard:** Owner at the subscription level — matches the existing operational trust model (Howard has "Full trust — same access as admin" per `CLAUDE.md`). One-time grant unblocks all future MSP-side Azure self-service. Mike runs:
|
||||
```bash
|
||||
az role assignment create \
|
||||
--assignee howard.enos@azcomputerguru.com \
|
||||
--role "Owner" \
|
||||
--scope "/subscriptions/e507e953-2ce9-4887-ba96-9b654f7d3267"
|
||||
```
|
||||
Guardrails to keep Owner-Howard low-risk:
|
||||
- Resource lock on `gururmm-signing-rg`: `az lock create --name signing-protect --lock-type CanNotDelete --resource-group gururmm-signing-rg`
|
||||
- PAYG cost alert at ~$50/mo via Cost Management (UI task)
|
||||
- **Region:** `westus2` for all audit resources. Latency-friendly to Tucson, mature service availability, no HIPAA-relevant cost difference vs other US regions.
|
||||
|
||||
### Customer-tenant side (per onboarded HIPAA tenant)
|
||||
|
||||
- **Tenant Admin SP must have** `Policy.Read.All` (already in updated `onboard-tenant.sh`)
|
||||
- **Tenant Admin SP must have** the directory role **Security Administrator** OR a custom role with `Microsoft.Insights/diagnosticSettings/write` to create Diagnostic Settings on Entra. (Conditional Access Administrator alone does NOT cover Monitor scope.)
|
||||
- **Tenant Admin app manifest must include** `AuditLog.Read.All` and either Graph's `IdentityRiskyUser.Read.All` (already present per `SEC_INV_GRAPH_ROLES`) or follow-on for Defender export
|
||||
|
||||
### Tag schema (apply to every resource)
|
||||
|
||||
```
|
||||
client = cascadestucson
|
||||
tier = hipaa
|
||||
service = audit
|
||||
cost-center = msp-audit
|
||||
created-by = howard | mike | onboard-tenant.sh
|
||||
```
|
||||
|
||||
## Per-tenant onboarding — Cascades example
|
||||
|
||||
Substitute `<short>` = `cascades` (lowercase, no punctuation, ≤8 chars). Substitute `<full>` = `cascadestucson`.
|
||||
|
||||
### Phase 1: ACG-side resource provisioning
|
||||
|
||||
Howard runs from his workstation with az CLI logged into ACG home tenant:
|
||||
|
||||
```bash
|
||||
SUB="e507e953-2ce9-4887-ba96-9b654f7d3267"
|
||||
SHORT="cascades"
|
||||
FULL="cascadestucson"
|
||||
REGION="westus2"
|
||||
RG="rg-audit-${FULL}"
|
||||
|
||||
az account set --subscription "$SUB"
|
||||
|
||||
# Resource group
|
||||
az group create --name "$RG" --location "$REGION" \
|
||||
--tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit created-by=howard
|
||||
|
||||
# Storage Account (must be globally unique, lowercase alphanumeric, 3-24 chars)
|
||||
SA_NAME="stor${SHORT}audit"
|
||||
az storage account create \
|
||||
--name "$SA_NAME" \
|
||||
--resource-group "$RG" \
|
||||
--location "$REGION" \
|
||||
--sku Standard_LRS \
|
||||
--kind StorageV2 \
|
||||
--access-tier Cool \
|
||||
--min-tls-version TLS1_2 \
|
||||
--allow-blob-public-access false \
|
||||
--tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit
|
||||
|
||||
# Containers
|
||||
SA_KEY=$(az storage account keys list -g "$RG" -n "$SA_NAME" --query '[0].value' -o tsv)
|
||||
for c in entra-signin entra-audit entra-provisioning intune-audit defender-alerts ual; do
|
||||
az storage container create --name "$c" --account-name "$SA_NAME" --account-key "$SA_KEY"
|
||||
done
|
||||
|
||||
# Lifecycle policy: hot 30d -> cool 60d -> archive 6y -> delete
|
||||
cat > /tmp/lifecycle.json <<'EOF'
|
||||
{
|
||||
"rules": [{
|
||||
"name": "hipaa-6y-tier-down",
|
||||
"enabled": true,
|
||||
"type": "Lifecycle",
|
||||
"definition": {
|
||||
"filters": { "blobTypes": ["blockBlob"] },
|
||||
"actions": {
|
||||
"baseBlob": {
|
||||
"tierToCool": { "daysAfterModificationGreaterThan": 30 },
|
||||
"tierToArchive": { "daysAfterModificationGreaterThan": 90 },
|
||||
"delete": { "daysAfterModificationGreaterThan": 2190 }
|
||||
}
|
||||
}
|
||||
}
|
||||
}]
|
||||
}
|
||||
EOF
|
||||
az storage account management-policy create \
|
||||
--account-name "$SA_NAME" \
|
||||
--resource-group "$RG" \
|
||||
--policy @/tmp/lifecycle.json
|
||||
|
||||
# Immutability (legal hold) — defer until pilot validated.
|
||||
# When ready: az storage container immutability-policy create ...
|
||||
|
||||
# Log Analytics Workspace
|
||||
LAW_NAME="law-${SHORT}-audit"
|
||||
az monitor log-analytics workspace create \
|
||||
--resource-group "$RG" \
|
||||
--workspace-name "$LAW_NAME" \
|
||||
--location "$REGION" \
|
||||
--retention-time 90 \
|
||||
--tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit
|
||||
```
|
||||
|
||||
### Phase 2: Customer-tenant Diagnostic Settings
|
||||
|
||||
Performed against Cascades tenant using Tenant Admin token:
|
||||
|
||||
```bash
|
||||
CASCADES_TENANT="207fa277-e9d8-4eb7-ada1-1064d2221498"
|
||||
TOKEN=$(bash .claude/skills/remediation-tool/scripts/get-token.sh "$CASCADES_TENANT" tenant-admin)
|
||||
|
||||
LAW_RESOURCE="/subscriptions/${SUB}/resourceGroups/${RG}/providers/Microsoft.OperationalInsights/workspaces/${LAW_NAME}"
|
||||
SA_RESOURCE="/subscriptions/${SUB}/resourceGroups/${RG}/providers/Microsoft.Storage/storageAccounts/${SA_NAME}"
|
||||
|
||||
# Entra Diagnostic Settings (covers sign-in + audit + provisioning + non-interactive)
|
||||
curl -X PUT \
|
||||
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
|
||||
"https://graph.microsoft.com/beta/auditLogs/directoryAudits" \
|
||||
-d @- <<EOF
|
||||
{
|
||||
"name": "acg-audit-export",
|
||||
"logs": [
|
||||
{"category": "AuditLogs", "enabled": true},
|
||||
{"category": "SignInLogs", "enabled": true},
|
||||
{"category": "NonInteractiveUserSignInLogs", "enabled": true},
|
||||
{"category": "ServicePrincipalSignInLogs", "enabled": true},
|
||||
{"category": "ManagedIdentitySignInLogs", "enabled": true},
|
||||
{"category": "ProvisioningLogs", "enabled": true},
|
||||
{"category": "ADFSSignInLogs", "enabled": true},
|
||||
{"category": "RiskyUsers", "enabled": true},
|
||||
{"category": "UserRiskEvents", "enabled": true}
|
||||
],
|
||||
"workspaceId": "${LAW_RESOURCE}",
|
||||
"storageAccountId": "${SA_RESOURCE}"
|
||||
}
|
||||
EOF
|
||||
```
|
||||
|
||||
Note: Entra Diagnostic Settings actually go through Azure Resource Manager (not Graph), and the proper endpoint is:
|
||||
|
||||
```
|
||||
PUT https://management.azure.com/providers/microsoft.aadiam/diagnosticSettings/{name}?api-version=2017-04-01-preview
|
||||
```
|
||||
|
||||
Authenticate against ARM (`https://management.azure.com`), not Graph. The Tenant Admin SP needs `Microsoft.AzureActiveDirectory/diagnosticSettings/write` permission, granted via the Security Administrator directory role. Howard: validate the working endpoint during dry-run; the cURL above is the conceptual shape, not the exact call.
|
||||
|
||||
### Phase 3: Verification (1h after setup)
|
||||
|
||||
```bash
|
||||
# Query LAW for recent sign-ins
|
||||
az monitor log-analytics query \
|
||||
--workspace "$LAW_NAME" \
|
||||
--resource-group "$RG" \
|
||||
--analytics-query "SigninLogs | take 5 | project TimeGenerated, UserPrincipalName, ResultType"
|
||||
|
||||
# Confirm Storage Account is receiving blobs
|
||||
az storage blob list --container-name insights-logs-signinlogs \
|
||||
--account-name "$SA_NAME" --account-key "$SA_KEY" --num-results 5
|
||||
```
|
||||
|
||||
If LAW returns rows and SA has blobs, the export is live.
|
||||
|
||||
### Phase 4: UAL harvester (deferred — separate buildout)
|
||||
|
||||
UAL has no Diagnostic Settings export. Approach when we get to it:
|
||||
|
||||
- Azure Function (Python or PowerShell), timer trigger every 4h
|
||||
- Per onboarded tenant: managed identity granted `ActivityFeed.Read` against Office 365 Management API
|
||||
- Polls `/api/v1.0/{tenantId}/activity/feed/subscriptions/content?contentType=Audit.AzureActiveDirectory|Audit.Exchange|Audit.SharePoint|Audit.General|DLP.All`
|
||||
- Writes raw JSON to `<sa>/ual/{yyyy}/{MM}/{dd}/{tenantId}/{contentType}-{timestamp}.json`
|
||||
- Deduplicates via `contentId`
|
||||
|
||||
Codify the design once we've run it manually for a few weeks against Cascades. Estimated build: 4-6 hours dev + test.
|
||||
|
||||
## Operational
|
||||
|
||||
### Quarterly verification (per tenant, ~10 min)
|
||||
|
||||
1. Run a `SigninLogs | summarize count() by bin(TimeGenerated, 1d) | order by TimeGenerated desc` query in LAW. Expect daily volume.
|
||||
2. Spot-check Storage Account container blob counts and timestamps.
|
||||
3. Confirm lifecycle policy hasn't drifted: `az storage account management-policy show -g $RG -n $SA_NAME`.
|
||||
4. Cost: `az consumption usage list --start-date $(date -d '30 days ago' +%Y-%m-%d) --end-date $(date +%Y-%m-%d) --query "[?contains(instanceId,'$SA_NAME')||contains(instanceId,'$LAW_NAME')]"` — should be ~$1/mo per tenant.
|
||||
|
||||
### Forensics retrieval
|
||||
|
||||
- **0–90 days:** KQL on LAW directly. Sub-second queries.
|
||||
- **90 days – 6 years:** rehydrate blob from archive tier.
|
||||
```bash
|
||||
az storage blob set-tier --tier Hot --rehydrate-priority Standard \
|
||||
--account-name $SA_NAME --container-name <c> --name <blob>
|
||||
```
|
||||
Standard rehydrate SLA: ~15 hours. High-priority: ~1 hour, costs ~10x more.
|
||||
|
||||
### When to upgrade subscription split
|
||||
|
||||
- Triggers: 3+ HIPAA tenants, an external compliance audit asking about subscription scope, or one tenant generating >10 GB/month
|
||||
- Path: provision new subscription `acg-msp-compliance`, move RGs via `az resource move`, update Diagnostic Settings destination ARM IDs
|
||||
|
||||
## Onboarding integration (codify after pilot validated)
|
||||
|
||||
Once Cascades is running cleanly for 30 days, fold the per-tenant Phase 1 + Phase 2 into `onboard-tenant.sh` as a flag:
|
||||
|
||||
```bash
|
||||
bash onboard-tenant.sh <tenant-id> --enable-audit-archive --client-shortname <short>
|
||||
```
|
||||
|
||||
Implementation outline:
|
||||
- Read `--enable-audit-archive` flag
|
||||
- Provision RG + SA + LAW under ACG sub (idempotent: skip if exists)
|
||||
- Issue PUT for Diagnostic Settings against the customer tenant
|
||||
- Append "Audit archive: [OK]" row to the final status table
|
||||
|
||||
Until codified, Howard runs the runbook manually per tenant. Cascades is the only HIPAA-tier tenant currently — this is fine.
|
||||
|
||||
## Open questions / future work
|
||||
|
||||
- **UAL harvester:** designed but not built. Punt until pilot CA cutover is done.
|
||||
- **Defender for Office 365 export:** does it expose Diagnostic Settings? If not, may need OMA-style poll. Check during Cascades verification.
|
||||
- **MDE alerts:** ditto.
|
||||
- **Sentinel:** the natural upgrade path if alerting becomes important. Cost crosses ~$200/mo at first tenant — defer until justified by an actual operational need.
|
||||
- **Break-glass sign-in alert:** when break-glass admin lands, KQL alert rule on LAW: `SigninLogs | where UserPrincipalName == "breakglass-csc@cascadestucson.com"` → Action Group → email Mike + Howard. Lives in this same LAW.
|
||||
Reference in New Issue
Block a user