Files
claudetools/.claude/skills/remediation-tool/references/audit-retention-runbook.md
Mike Swanson 447b90e092 Session log: Cascades audit retention design + Pro-Tech Services email investigation
Cascades:
- Approved Howard's corrected 4-policy CA bypass design
- Caught + fixed policy 3 GDAP bug (Service provider users exclusion)
- Decided hybrid LAW + Storage Account audit retention (ACG-billed,
  reuse existing Trusted Signing Azure subscription, westus2)
- Wrote full audit retention runbook for Howard
- Reshaped break-glass to two accounts (split-storage YubiKeys)
- Documented Cascades M365 admin model (admin@/sysadmin@ Connect-excluded
  by design; local AD Administrator separate identity layer)
- Decided Howard gets Owner on ACG sub with guardrails (resource lock +
  cost alert) instead of per-RG Contributor

Pro-Tech Services:
- DNS recon of pro-techhelps.com + pro-techservices.co
- Diagnosed calendar invite delivery issue (DKIM domain mismatch +
  no DMARC = strict receivers silently drop invites)
- Drafted non-technical IT-provider migration email to Michelle Sora

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:05:41 -07:00

13 KiB
Raw Blame History

Audit Retention Runbook (HIPAA-tier)

ACG-side architecture for capturing and retaining 6-year audit logs from customer M365 tenants. First implementation: Cascades Tucson.

Why this exists

HIPAA §164.312(b) requires audit controls; §164.316(b)(2)(i) requires 6-year retention.

M365 native retention falls short of 6 years on every relevant log source:

Source Native Gap to 6yr
Entra sign-in / audit / provisioning logs 30d 5y 11m
Purview Unified Audit Log (Exchange/SP/OD/Teams) 180d 5.5y
Intune audit 1y 5y
Defender alerts 30d 5y 11m

We close the gap by exporting via Diagnostic Settings to ACG-owned destinations and supplementing UAL with a poll-based harvester.

Architecture

Hybrid: Log Analytics for live forensics + Storage Account for cold archive.

Customer Tenant (Cascades, etc.)
  Diagnostic Settings ──┬──> [LAW]   law-<short>-audit       (90d interactive)
                        └──> [SA]    stor<short>audit        (lifecycle: hot 30d -> cool 60d -> archive 6y -> delete)

Customer Tenant
  /v1.0/auditLogs (UAL) ──> ACG Function (poll q4h, per tenant) ──> SA blob path /ual/{yyyy}/{MM}/{dd}/...

Both LAW and SA receive the same stream from Diagnostic Settings — one ingest path, two retention tiers. The LAW is for human queries; the SA is for compliance archive.

UAL lacks a Diagnostic Settings hook, so we poll the Office 365 Management Activity API on a schedule and write JSON to the same Storage Account.

Cost model

Per HIPAA-tier tenant per month: ~$0.501.00

  • LAW ingest: ~$2.30/GB × ~0.1 GB/mo = ~$0.23/mo
  • LAW retention (90d): ~$0.10/GB × peak ~0.3 GB = ~$0.03/mo
  • Storage Account (cool/archive blended over 6y): ~$0.15/mo
  • Function compute (shared across tenants): rounded to zero
  • Egress (only on forensics retrieval): pay-per-use, typically zero/mo

ACG cumulative at 5 HIPAA tenants: ~$510/mo. Budget headroom for forensics rehydration: ~$50100 per incident retrieval (one-time).

Prerequisites

ACG-side (one-time)

  • Azure subscription: reuse existing e507e953-2ce9-4887-ba96-9b654f7d3267 — the ACG-owned subscription set up for GuruRMM Trusted Signing (cert profile gururmm-public-trust under gururmm-signing-rg). Vault entry: services/azure-trusted-signing.sops.yaml.
    • Rationale: Mike already has Owner on this sub; no new billing relationship needed; single tenant boundary; Azure RBAC + RG-level tagging keeps audit data isolated from signing data.
    • Existing usage in this sub: gururmm-signing-rg (Trusted Signing for GuruRMM agent binaries). Audit RGs (rg-audit-*) will be RG-isolated from signing.
    • Future split: when we have 3+ HIPAA tenants or a compliance audit requires hard boundary, move audit RGs to a dedicated acg-msp-compliance subscription via az resource move.
  • RBAC for Howard: Owner at the subscription level — matches the existing operational trust model (Howard has "Full trust — same access as admin" per CLAUDE.md). One-time grant unblocks all future MSP-side Azure self-service. Mike runs:
    az role assignment create \
      --assignee howard.enos@azcomputerguru.com \
      --role "Owner" \
      --scope "/subscriptions/e507e953-2ce9-4887-ba96-9b654f7d3267"
    
    Guardrails to keep Owner-Howard low-risk:
    • Resource lock on gururmm-signing-rg: az lock create --name signing-protect --lock-type CanNotDelete --resource-group gururmm-signing-rg
    • PAYG cost alert at ~$50/mo via Cost Management (UI task)
  • Region: westus2 for all audit resources. Latency-friendly to Tucson, mature service availability, no HIPAA-relevant cost difference vs other US regions.

Customer-tenant side (per onboarded HIPAA tenant)

  • Tenant Admin SP must have Policy.Read.All (already in updated onboard-tenant.sh)
  • Tenant Admin SP must have the directory role Security Administrator OR a custom role with Microsoft.Insights/diagnosticSettings/write to create Diagnostic Settings on Entra. (Conditional Access Administrator alone does NOT cover Monitor scope.)
  • Tenant Admin app manifest must include AuditLog.Read.All and either Graph's IdentityRiskyUser.Read.All (already present per SEC_INV_GRAPH_ROLES) or follow-on for Defender export

Tag schema (apply to every resource)

client          = cascadestucson
tier            = hipaa
service         = audit
cost-center     = msp-audit
created-by      = howard | mike | onboard-tenant.sh

Per-tenant onboarding — Cascades example

Substitute <short> = cascades (lowercase, no punctuation, ≤8 chars). Substitute <full> = cascadestucson.

Phase 1: ACG-side resource provisioning

Howard runs from his workstation with az CLI logged into ACG home tenant:

SUB="e507e953-2ce9-4887-ba96-9b654f7d3267"
SHORT="cascades"
FULL="cascadestucson"
REGION="westus2"
RG="rg-audit-${FULL}"

az account set --subscription "$SUB"

# Resource group
az group create --name "$RG" --location "$REGION" \
  --tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit created-by=howard

# Storage Account (must be globally unique, lowercase alphanumeric, 3-24 chars)
SA_NAME="stor${SHORT}audit"
az storage account create \
  --name "$SA_NAME" \
  --resource-group "$RG" \
  --location "$REGION" \
  --sku Standard_LRS \
  --kind StorageV2 \
  --access-tier Cool \
  --min-tls-version TLS1_2 \
  --allow-blob-public-access false \
  --tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit

# Containers
SA_KEY=$(az storage account keys list -g "$RG" -n "$SA_NAME" --query '[0].value' -o tsv)
for c in entra-signin entra-audit entra-provisioning intune-audit defender-alerts ual; do
  az storage container create --name "$c" --account-name "$SA_NAME" --account-key "$SA_KEY"
done

# Lifecycle policy: hot 30d -> cool 60d -> archive 6y -> delete
cat > /tmp/lifecycle.json <<'EOF'
{
  "rules": [{
    "name": "hipaa-6y-tier-down",
    "enabled": true,
    "type": "Lifecycle",
    "definition": {
      "filters": { "blobTypes": ["blockBlob"] },
      "actions": {
        "baseBlob": {
          "tierToCool":     { "daysAfterModificationGreaterThan": 30 },
          "tierToArchive":  { "daysAfterModificationGreaterThan": 90 },
          "delete":         { "daysAfterModificationGreaterThan": 2190 }
        }
      }
    }
  }]
}
EOF
az storage account management-policy create \
  --account-name "$SA_NAME" \
  --resource-group "$RG" \
  --policy @/tmp/lifecycle.json

# Immutability (legal hold) — defer until pilot validated.
# When ready: az storage container immutability-policy create ...

# Log Analytics Workspace
LAW_NAME="law-${SHORT}-audit"
az monitor log-analytics workspace create \
  --resource-group "$RG" \
  --workspace-name "$LAW_NAME" \
  --location "$REGION" \
  --retention-time 90 \
  --tags client="$FULL" tier=hipaa service=audit cost-center=msp-audit

Phase 2: Customer-tenant Diagnostic Settings

Performed against Cascades tenant using Tenant Admin token:

CASCADES_TENANT="207fa277-e9d8-4eb7-ada1-1064d2221498"
TOKEN=$(bash .claude/skills/remediation-tool/scripts/get-token.sh "$CASCADES_TENANT" tenant-admin)

LAW_RESOURCE="/subscriptions/${SUB}/resourceGroups/${RG}/providers/Microsoft.OperationalInsights/workspaces/${LAW_NAME}"
SA_RESOURCE="/subscriptions/${SUB}/resourceGroups/${RG}/providers/Microsoft.Storage/storageAccounts/${SA_NAME}"

# Entra Diagnostic Settings (covers sign-in + audit + provisioning + non-interactive)
curl -X PUT \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  "https://graph.microsoft.com/beta/auditLogs/directoryAudits" \
  -d @- <<EOF
{
  "name": "acg-audit-export",
  "logs": [
    {"category": "AuditLogs",                       "enabled": true},
    {"category": "SignInLogs",                      "enabled": true},
    {"category": "NonInteractiveUserSignInLogs",    "enabled": true},
    {"category": "ServicePrincipalSignInLogs",      "enabled": true},
    {"category": "ManagedIdentitySignInLogs",       "enabled": true},
    {"category": "ProvisioningLogs",                "enabled": true},
    {"category": "ADFSSignInLogs",                  "enabled": true},
    {"category": "RiskyUsers",                      "enabled": true},
    {"category": "UserRiskEvents",                  "enabled": true}
  ],
  "workspaceId": "${LAW_RESOURCE}",
  "storageAccountId": "${SA_RESOURCE}"
}
EOF

Note: Entra Diagnostic Settings actually go through Azure Resource Manager (not Graph), and the proper endpoint is:

PUT https://management.azure.com/providers/microsoft.aadiam/diagnosticSettings/{name}?api-version=2017-04-01-preview

Authenticate against ARM (https://management.azure.com), not Graph. The Tenant Admin SP needs Microsoft.AzureActiveDirectory/diagnosticSettings/write permission, granted via the Security Administrator directory role. Howard: validate the working endpoint during dry-run; the cURL above is the conceptual shape, not the exact call.

Phase 3: Verification (1h after setup)

# Query LAW for recent sign-ins
az monitor log-analytics query \
  --workspace "$LAW_NAME" \
  --resource-group "$RG" \
  --analytics-query "SigninLogs | take 5 | project TimeGenerated, UserPrincipalName, ResultType"

# Confirm Storage Account is receiving blobs
az storage blob list --container-name insights-logs-signinlogs \
  --account-name "$SA_NAME" --account-key "$SA_KEY" --num-results 5

If LAW returns rows and SA has blobs, the export is live.

Phase 4: UAL harvester (deferred — separate buildout)

UAL has no Diagnostic Settings export. Approach when we get to it:

  • Azure Function (Python or PowerShell), timer trigger every 4h
  • Per onboarded tenant: managed identity granted ActivityFeed.Read against Office 365 Management API
  • Polls /api/v1.0/{tenantId}/activity/feed/subscriptions/content?contentType=Audit.AzureActiveDirectory|Audit.Exchange|Audit.SharePoint|Audit.General|DLP.All
  • Writes raw JSON to <sa>/ual/{yyyy}/{MM}/{dd}/{tenantId}/{contentType}-{timestamp}.json
  • Deduplicates via contentId

Codify the design once we've run it manually for a few weeks against Cascades. Estimated build: 4-6 hours dev + test.

Operational

Quarterly verification (per tenant, ~10 min)

  1. Run a SigninLogs | summarize count() by bin(TimeGenerated, 1d) | order by TimeGenerated desc query in LAW. Expect daily volume.
  2. Spot-check Storage Account container blob counts and timestamps.
  3. Confirm lifecycle policy hasn't drifted: az storage account management-policy show -g $RG -n $SA_NAME.
  4. Cost: az consumption usage list --start-date $(date -d '30 days ago' +%Y-%m-%d) --end-date $(date +%Y-%m-%d) --query "[?contains(instanceId,'$SA_NAME')||contains(instanceId,'$LAW_NAME')]" — should be ~$1/mo per tenant.

Forensics retrieval

  • 090 days: KQL on LAW directly. Sub-second queries.
  • 90 days 6 years: rehydrate blob from archive tier.
    az storage blob set-tier --tier Hot --rehydrate-priority Standard \
      --account-name $SA_NAME --container-name <c> --name <blob>
    
    Standard rehydrate SLA: ~15 hours. High-priority: ~1 hour, costs ~10x more.

When to upgrade subscription split

  • Triggers: 3+ HIPAA tenants, an external compliance audit asking about subscription scope, or one tenant generating >10 GB/month
  • Path: provision new subscription acg-msp-compliance, move RGs via az resource move, update Diagnostic Settings destination ARM IDs

Onboarding integration (codify after pilot validated)

Once Cascades is running cleanly for 30 days, fold the per-tenant Phase 1 + Phase 2 into onboard-tenant.sh as a flag:

bash onboard-tenant.sh <tenant-id> --enable-audit-archive --client-shortname <short>

Implementation outline:

  • Read --enable-audit-archive flag
  • Provision RG + SA + LAW under ACG sub (idempotent: skip if exists)
  • Issue PUT for Diagnostic Settings against the customer tenant
  • Append "Audit archive: [OK]" row to the final status table

Until codified, Howard runs the runbook manually per tenant. Cascades is the only HIPAA-tier tenant currently — this is fine.

Open questions / future work

  • UAL harvester: designed but not built. Punt until pilot CA cutover is done.
  • Defender for Office 365 export: does it expose Diagnostic Settings? If not, may need OMA-style poll. Check during Cascades verification.
  • MDE alerts: ditto.
  • Sentinel: the natural upgrade path if alerting becomes important. Cost crosses ~$200/mo at first tenant — defer until justified by an actual operational need.
  • Break-glass sign-in alert: when break-glass admin lands, KQL alert rule on LAW: SigninLogs | where UserPrincipalName == "breakglass-csc@cascadestucson.com" → Action Group → email Mike + Howard. Lives in this same LAW.