Session log: Cascades audit retention design + Pro-Tech Services email investigation

Cascades:
- Approved Howard's corrected 4-policy CA bypass design
- Caught + fixed policy 3 GDAP bug (Service provider users exclusion)
- Decided hybrid LAW + Storage Account audit retention (ACG-billed,
  reuse existing Trusted Signing Azure subscription, westus2)
- Wrote full audit retention runbook for Howard
- Reshaped break-glass to two accounts (split-storage YubiKeys)
- Documented Cascades M365 admin model (admin@/sysadmin@ Connect-excluded
  by design; local AD Administrator separate identity layer)
- Decided Howard gets Owner on ACG sub with guardrails (resource lock +
  cost alert) instead of per-RG Contributor

Pro-Tech Services:
- DNS recon of pro-techhelps.com + pro-techservices.co
- Diagnosed calendar invite delivery issue (DKIM domain mismatch +
  no DMARC = strict receivers silently drop invites)
- Drafted non-technical IT-provider migration email to Michelle Sora

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-29 17:05:41 -07:00
parent 6b63c154d2
commit 447b90e092
4 changed files with 706 additions and 0 deletions

View File

@@ -0,0 +1,280 @@
# Audit Retention Runbook (HIPAA-tier)
ACG-side architecture for capturing and retaining 6-year audit logs from customer M365 tenants. First implementation: Cascades Tucson.
## Why this exists
HIPAA §164.312(b) requires audit controls; §164.316(b)(2)(i) requires 6-year retention.
M365 native retention falls short of 6 years on every relevant log source:
| Source | Native | Gap to 6yr |
|---|---|---|
| Entra sign-in / audit / provisioning logs | 30d | 5y 11m |
| Purview Unified Audit Log (Exchange/SP/OD/Teams) | 180d | 5.5y |
| Intune audit | 1y | 5y |
| Defender alerts | 30d | 5y 11m |
We close the gap by exporting via Diagnostic Settings to ACG-owned destinations and supplementing UAL with a poll-based harvester.
## Architecture
Hybrid: Log Analytics for live forensics + Storage Account for cold archive.
```
Customer Tenant (Cascades, etc.)
Diagnostic Settings ──┬──> [LAW] law-<short>-audit (90d interactive)
└──> [SA] stor<short>audit (lifecycle: hot 30d -> cool 60d -> archive 6y -> delete)
Customer Tenant
/v1.0/auditLogs (UAL) ──> ACG Function (poll q4h, per tenant) ──> SA blob path /ual/{yyyy}/{MM}/{dd}/...
```
Both LAW and SA receive the same stream from Diagnostic Settings — one ingest path, two retention tiers. The LAW is for human queries; the SA is for compliance archive.
UAL lacks a Diagnostic Settings hook, so we poll the Office 365 Management Activity API on a schedule and write JSON to the same Storage Account.
## Cost model
Per HIPAA-tier tenant per month: **~$0.501.00**
- LAW ingest: ~$2.30/GB × ~0.1 GB/mo = ~$0.23/mo
- LAW retention (90d): ~$0.10/GB × peak ~0.3 GB = ~$0.03/mo
- Storage Account (cool/archive blended over 6y): ~$0.15/mo
- Function compute (shared across tenants): rounded to zero
- Egress (only on forensics retrieval): pay-per-use, typically zero/mo
ACG cumulative at 5 HIPAA tenants: ~$510/mo. Budget headroom for forensics rehydration: ~$50100 per incident retrieval (one-time).
## Prerequisites
### ACG-side (one-time)
- **Azure subscription:** reuse existing `e507e953-2ce9-4887-ba96-9b654f7d3267` — the ACG-owned subscription set up for GuruRMM Trusted Signing (cert profile `gururmm-public-trust` under `gururmm-signing-rg`). Vault entry: `services/azure-trusted-signing.sops.yaml`.
- Rationale: Mike already has Owner on this sub; no new billing relationship needed; single tenant boundary; Azure RBAC + RG-level tagging keeps audit data isolated from signing data.
- **Existing usage in this sub:** `gururmm-signing-rg` (Trusted Signing for GuruRMM agent binaries). Audit RGs (`rg-audit-*`) will be RG-isolated from signing.
- Future split: when we have 3+ HIPAA tenants or a compliance audit requires hard boundary, move audit RGs to a dedicated `acg-msp-compliance` subscription via `az resource move`.
- **RBAC for Howard:** Owner at the subscription level — matches the existing operational trust model (Howard has "Full trust — same access as admin" per `CLAUDE.md`). One-time grant unblocks all future MSP-side Azure self-service. Mike runs:
```bash
az role assignment create \
--assignee howard.enos@azcomputerguru.com \
--role "Owner" \
--scope "/subscriptions/e507e953-2ce9-4887-ba96-9b654f7d3267"
```
Guardrails to keep Owner-Howard low-risk:
- Resource lock on `gururmm-signing-rg`: `az lock create --name signing-protect --lock-type CanNotDelete --resource-group gururmm-signing-rg`
- PAYG cost alert at ~$50/mo via Cost Management (UI task)
- **Region:** `westus2` for all audit resources. Latency-friendly to Tucson, mature service availability, no HIPAA-relevant cost difference vs other US regions.
### Customer-tenant side (per onboarded HIPAA tenant)
- **Tenant Admin SP must have** `Policy.Read.All` (already in updated `onboard-tenant.sh`)
- **Tenant Admin SP must have** the directory role **Security Administrator** OR a custom role with `Microsoft.Insights/diagnosticSettings/write` to create Diagnostic Settings on Entra. (Conditional Access Administrator alone does NOT cover Monitor scope.)
- **Tenant Admin app manifest must include** `AuditLog.Read.All` and either Graph's `IdentityRiskyUser.Read.All` (already present per `SEC_INV_GRAPH_ROLES`) or follow-on for Defender export
### Tag schema (apply to every resource)
```
client = cascadestucson
tier = hipaa
service = audit
cost-center = msp-audit
created-by = howard | mike | onboard-tenant.sh
```
## Per-tenant onboarding — Cascades example
Substitute `<short>` = `cascades` (lowercase, no punctuation, ≤8 chars). Substitute `<full>` = `cascadestucson`.
### Phase 1: ACG-side resource provisioning
Howard runs from his workstation with az CLI logged into ACG home tenant:
```bash
SUB="e507e953-2ce9-4887-ba96-9b654f7d3267"
SHORT="cascades"
FULL="cascadestucson"
REGION="westus2"
RG="rg-audit-${FULL}"
az account set --subscription "$SUB"
# Resource group
az group create --name "$RG" --location "$REGION" \
--tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit created-by=howard
# Storage Account (must be globally unique, lowercase alphanumeric, 3-24 chars)
SA_NAME="stor${SHORT}audit"
az storage account create \
--name "$SA_NAME" \
--resource-group "$RG" \
--location "$REGION" \
--sku Standard_LRS \
--kind StorageV2 \
--access-tier Cool \
--min-tls-version TLS1_2 \
--allow-blob-public-access false \
--tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit
# Containers
SA_KEY=$(az storage account keys list -g "$RG" -n "$SA_NAME" --query '[0].value' -o tsv)
for c in entra-signin entra-audit entra-provisioning intune-audit defender-alerts ual; do
az storage container create --name "$c" --account-name "$SA_NAME" --account-key "$SA_KEY"
done
# Lifecycle policy: hot 30d -> cool 60d -> archive 6y -> delete
cat > /tmp/lifecycle.json <<'EOF'
{
"rules": [{
"name": "hipaa-6y-tier-down",
"enabled": true,
"type": "Lifecycle",
"definition": {
"filters": { "blobTypes": ["blockBlob"] },
"actions": {
"baseBlob": {
"tierToCool": { "daysAfterModificationGreaterThan": 30 },
"tierToArchive": { "daysAfterModificationGreaterThan": 90 },
"delete": { "daysAfterModificationGreaterThan": 2190 }
}
}
}
}]
}
EOF
az storage account management-policy create \
--account-name "$SA_NAME" \
--resource-group "$RG" \
--policy @/tmp/lifecycle.json
# Immutability (legal hold) — defer until pilot validated.
# When ready: az storage container immutability-policy create ...
# Log Analytics Workspace
LAW_NAME="law-${SHORT}-audit"
az monitor log-analytics workspace create \
--resource-group "$RG" \
--workspace-name "$LAW_NAME" \
--location "$REGION" \
--retention-time 90 \
--tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit
```
### Phase 2: Customer-tenant Diagnostic Settings
Performed against Cascades tenant using Tenant Admin token:
```bash
CASCADES_TENANT="207fa277-e9d8-4eb7-ada1-1064d2221498"
TOKEN=$(bash .claude/skills/remediation-tool/scripts/get-token.sh "$CASCADES_TENANT" tenant-admin)
LAW_RESOURCE="/subscriptions/${SUB}/resourceGroups/${RG}/providers/Microsoft.OperationalInsights/workspaces/${LAW_NAME}"
SA_RESOURCE="/subscriptions/${SUB}/resourceGroups/${RG}/providers/Microsoft.Storage/storageAccounts/${SA_NAME}"
# Entra Diagnostic Settings (covers sign-in + audit + provisioning + non-interactive)
curl -X PUT \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
"https://graph.microsoft.com/beta/auditLogs/directoryAudits" \
-d @- <<EOF
{
"name": "acg-audit-export",
"logs": [
{"category": "AuditLogs", "enabled": true},
{"category": "SignInLogs", "enabled": true},
{"category": "NonInteractiveUserSignInLogs", "enabled": true},
{"category": "ServicePrincipalSignInLogs", "enabled": true},
{"category": "ManagedIdentitySignInLogs", "enabled": true},
{"category": "ProvisioningLogs", "enabled": true},
{"category": "ADFSSignInLogs", "enabled": true},
{"category": "RiskyUsers", "enabled": true},
{"category": "UserRiskEvents", "enabled": true}
],
"workspaceId": "${LAW_RESOURCE}",
"storageAccountId": "${SA_RESOURCE}"
}
EOF
```
Note: Entra Diagnostic Settings actually go through Azure Resource Manager (not Graph), and the proper endpoint is:
```
PUT https://management.azure.com/providers/microsoft.aadiam/diagnosticSettings/{name}?api-version=2017-04-01-preview
```
Authenticate against ARM (`https://management.azure.com`), not Graph. The Tenant Admin SP needs `Microsoft.AzureActiveDirectory/diagnosticSettings/write` permission, granted via the Security Administrator directory role. Howard: validate the working endpoint during dry-run; the cURL above is the conceptual shape, not the exact call.
### Phase 3: Verification (1h after setup)
```bash
# Query LAW for recent sign-ins
az monitor log-analytics query \
--workspace "$LAW_NAME" \
--resource-group "$RG" \
--analytics-query "SigninLogs | take 5 | project TimeGenerated, UserPrincipalName, ResultType"
# Confirm Storage Account is receiving blobs
az storage blob list --container-name insights-logs-signinlogs \
--account-name "$SA_NAME" --account-key "$SA_KEY" --num-results 5
```
If LAW returns rows and SA has blobs, the export is live.
### Phase 4: UAL harvester (deferred — separate buildout)
UAL has no Diagnostic Settings export. Approach when we get to it:
- Azure Function (Python or PowerShell), timer trigger every 4h
- Per onboarded tenant: managed identity granted `ActivityFeed.Read` against Office 365 Management API
- Polls `/api/v1.0/{tenantId}/activity/feed/subscriptions/content?contentType=Audit.AzureActiveDirectory|Audit.Exchange|Audit.SharePoint|Audit.General|DLP.All`
- Writes raw JSON to `<sa>/ual/{yyyy}/{MM}/{dd}/{tenantId}/{contentType}-{timestamp}.json`
- Deduplicates via `contentId`
Codify the design once we've run it manually for a few weeks against Cascades. Estimated build: 4-6 hours dev + test.
## Operational
### Quarterly verification (per tenant, ~10 min)
1. Run a `SigninLogs | summarize count() by bin(TimeGenerated, 1d) | order by TimeGenerated desc` query in LAW. Expect daily volume.
2. Spot-check Storage Account container blob counts and timestamps.
3. Confirm lifecycle policy hasn't drifted: `az storage account management-policy show -g $RG -n $SA_NAME`.
4. Cost: `az consumption usage list --start-date $(date -d '30 days ago' +%Y-%m-%d) --end-date $(date +%Y-%m-%d) --query "[?contains(instanceId,'$SA_NAME')||contains(instanceId,'$LAW_NAME')]"` — should be ~$1/mo per tenant.
### Forensics retrieval
- **090 days:** KQL on LAW directly. Sub-second queries.
- **90 days 6 years:** rehydrate blob from archive tier.
```bash
az storage blob set-tier --tier Hot --rehydrate-priority Standard \
--account-name $SA_NAME --container-name <c> --name <blob>
```
Standard rehydrate SLA: ~15 hours. High-priority: ~1 hour, costs ~10x more.
### When to upgrade subscription split
- Triggers: 3+ HIPAA tenants, an external compliance audit asking about subscription scope, or one tenant generating >10 GB/month
- Path: provision new subscription `acg-msp-compliance`, move RGs via `az resource move`, update Diagnostic Settings destination ARM IDs
## Onboarding integration (codify after pilot validated)
Once Cascades is running cleanly for 30 days, fold the per-tenant Phase 1 + Phase 2 into `onboard-tenant.sh` as a flag:
```bash
bash onboard-tenant.sh <tenant-id> --enable-audit-archive --client-shortname <short>
```
Implementation outline:
- Read `--enable-audit-archive` flag
- Provision RG + SA + LAW under ACG sub (idempotent: skip if exists)
- Issue PUT for Diagnostic Settings against the customer tenant
- Append "Audit archive: [OK]" row to the final status table
Until codified, Howard runs the runbook manually per tenant. Cascades is the only HIPAA-tier tenant currently — this is fine.
## Open questions / future work
- **UAL harvester:** designed but not built. Punt until pilot CA cutover is done.
- **Defender for Office 365 export:** does it expose Diagnostic Settings? If not, may need OMA-style poll. Check during Cascades verification.
- **MDE alerts:** ditto.
- **Sentinel:** the natural upgrade path if alerting becomes important. Cost crosses ~$200/mo at first tenant — defer until justified by an actual operational need.
- **Break-glass sign-in alert:** when break-glass admin lands, KQL alert rule on LAW: `SigninLogs | where UserPrincipalName == "breakglass-csc@cascadestucson.com"` → Action Group → email Mike + Howard. Lives in this same LAW.