EXO email-cleanup tasks (Search-UnifiedAuditLog, Get-MessageTrace, inbox rules) kept 401/403-ing per tenant because the Exchange Operator SP was missing the Exchange Admin directory role — admin consent grants Exchange.ManageAsApp but never the directory role. onboard-tenant.sh assigns it, but tenants consented before that step / by hand never got it, and nothing audited for it. Hence the recurring 'next onboarding will fix it' (false for already-onboarded tenants). - NEW assign-exchange-role.sh: idempotent role assignment via the authoritative roleManagement/directory/roleAssignments API (the legacy directoryRoles/members list reads back unreliably). <domain|--all> + --verify/--dry-run. - Backfilled the whole fleet (--all): 13 stragglers ASSIGNED, 12 already OK, 20 skipped (tenant-admin not consented), 0 errors. Safe Site included. - Standing audit documented (assign-exchange-role.sh --all --verify) + memory so no future session repeats the empty promise. - Adds wiki/clients/safesite.md (tenant + 4-source endpoint inventory + investigation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2.6 KiB
name, description, metadata
| name | description | metadata | ||
|---|---|---|---|---|
| feedback_exchange_role_recurring_gap | Exchange email-cleanup tasks fail with 401/403 because the EXO app SP is missing the Exchange Admin directory role — fix via the backfill script, never promise "next onboarding will fix it" |
|
Email-cleanup / mailbox-forensic tasks (Search-UnifiedAuditLog, Get-MessageTrace, Get/Remove-InboxRule, Set-Mailbox) kept failing per-tenant with EXO 401/403, and each session hand-waved "it'll be auto-added next onboarding." Mike (2026-06-08) called this out as recurring disappointment. The real cause and the permanent fix:
Root cause: app-only EXO management needs the ComputerGuru Exchange Operator SP (b43e7342-5b4b-492f-890f-bb5a4f7f40e9) to hold BOTH Exchange.ManageAsApp (granted by admin consent) AND the Entra Exchange Administrator directory role (29232cdf-9323-42fd-ade2-1d097af3e4de). Admin consent grants the API permission but NEVER the directory role. onboard-tenant.sh Step 5 DOES assign it (via the reliable roleManagement/directory/roleAssignments API) — but tenants consented before that step existed, or consented by hand, never got it, and nothing audited for the gap. So the recurrence was old/manual stragglers, not an onboarding bug.
The fix (do this, don't promise):
bash .claude/skills/remediation-tool/scripts/assign-exchange-role.sh <domain|--all> [--verify|--dry-run]— assigns the role to the Exchange Operator SP. Idempotent.--allbackfills every tenant inreferences/tenants.md; tenants where tenant-admin isn't consented are SKIPped. Backfilled fleet-wide 2026-06-08 (~10 stragglers fixed).- Standing audit: run
assign-exchange-role.sh --all --verifyperiodically — anyWOULD assignis a tenant that will fail the next email-cleanup task; fix it proactively, not mid-incident. - Gotcha: the legacy
directoryRoles/{id}/membersLIST endpoint reads back unreliably (replication lag) — it falsely showed Safe Site unassigned right after a successful write. Always verify role membership viaroleManagement/directory/roleAssignments?$filter=principalId eq '<sp>', not the members list. - Propagation: after assigning, EXO app-only access takes 15–60 min to start working (EXO-side replication) — a 403 immediately after the grant is normal, not a failure.
Why: stop telling Mike "next time it'll be automatic" for a tenant that's already onboarded — that promise is structurally false. The durable answer is the backfill + the standing --verify audit. See reference_acg_msp_stack and the remediation-tool tenants reference.