--- name: feedback_exchange_role_recurring_gap description: Exchange email-cleanup tasks fail with 401/403 because the EXO app SP is missing the Exchange Admin directory role — fix via the backfill script, never promise "next onboarding will fix it" metadata: type: feedback --- Email-cleanup / mailbox-forensic tasks (Search-UnifiedAuditLog, Get-MessageTrace, Get/Remove-InboxRule, Set-Mailbox) kept failing per-tenant with EXO 401/403, and each session hand-waved "it'll be auto-added next onboarding." Mike (2026-06-08) called this out as recurring disappointment. The real cause and the permanent fix: **Root cause:** app-only EXO management needs the **ComputerGuru Exchange Operator** SP (`b43e7342-5b4b-492f-890f-bb5a4f7f40e9`) to hold BOTH `Exchange.ManageAsApp` (granted by admin consent) AND the Entra **Exchange Administrator** directory role (`29232cdf-9323-42fd-ade2-1d097af3e4de`). Admin consent grants the API permission but NEVER the directory role. `onboard-tenant.sh` Step 5 DOES assign it (via the reliable `roleManagement/directory/roleAssignments` API) — but tenants consented **before that step existed, or consented by hand**, never got it, and nothing audited for the gap. So the recurrence was old/manual stragglers, not an onboarding bug. **The fix (do this, don't promise):** - `bash .claude/skills/remediation-tool/scripts/assign-exchange-role.sh [--verify|--dry-run]` — assigns the role to the Exchange Operator SP. Idempotent. `--all` backfills every tenant in `references/tenants.md`; tenants where tenant-admin isn't consented are SKIPped. **Backfilled fleet-wide 2026-06-08** (~10 stragglers fixed). - **Standing audit:** run `assign-exchange-role.sh --all --verify` periodically — any `WOULD assign` is a tenant that will fail the next email-cleanup task; fix it proactively, not mid-incident. - **Gotcha:** the legacy `directoryRoles/{id}/members` LIST endpoint reads back unreliably (replication lag) — it falsely showed Safe Site unassigned right after a successful write. Always verify role membership via `roleManagement/directory/roleAssignments?$filter=principalId eq ''`, not the members list. - **Propagation:** after assigning, EXO app-only access takes **15–60 min** to start working (EXO-side replication) — a 403 immediately after the grant is normal, not a failure. **Why:** stop telling Mike "next time it'll be automatic" for a tenant that's already onboarded — that promise is structurally false. The durable answer is the backfill + the standing `--verify` audit. See [[reference_acg_msp_stack]] and the remediation-tool tenants reference.