From 8152476ee41aebbea629a16a9d7472c849a48bcd Mon Sep 17 00:00:00 2001 From: Mike Swanson Date: Thu, 2 Jul 2026 15:13:21 -0700 Subject: [PATCH] remediation-tool: document the 365 app suite + build consent-audit Root-caused the recurring '365 suite isn't documented' pain: the apps are fine (tiered by privilege) but per-tenant consent is NOT uniform and there was no way to see a tenant's actual grant state. VWP had the Tenant Admin app but no SharePoint app-only role -> silent 401s until this session. - references/app-suite.md: authoritative, live-verified map of every app, App ID, and actually-granted permission per tier; the consent-drift problem + both fix methods (adminconsent URL, direct appRoleAssignment grant). - scripts/consent-audit.sh: audits a tenant (or --all) vs the baseline, grades GREEN/AMBER/RED, prints the exact fix per gap. Extends the assign-exchange-role --verify pattern to Graph scopes + SharePoint role + EXO role. Verified: BirthBio GREEN, VWP/Cascades AMBER (caught real drift - both missing grants). - SKILL.md: run consent-audit FIRST on any tenant task. Memory + errorlog correction. Co-Authored-By: Claude Fable 5 --- .claude/memory/MEMORY.md | 1 + .claude/memory/reference_365_app_suite.md | 32 +++ .claude/skills/remediation-tool/SKILL.md | 8 + .../remediation-tool/references/app-suite.md | 216 ++++++++++++++++++ .../remediation-tool/scripts/consent-audit.sh | 184 +++++++++++++++ errorlog.md | 6 + 6 files changed, 447 insertions(+) create mode 100644 .claude/memory/reference_365_app_suite.md create mode 100644 .claude/skills/remediation-tool/references/app-suite.md create mode 100755 .claude/skills/remediation-tool/scripts/consent-audit.sh diff --git a/.claude/memory/MEMORY.md b/.claude/memory/MEMORY.md index 593a6f60..e2631253 100644 --- a/.claude/memory/MEMORY.md +++ b/.claude/memory/MEMORY.md @@ -204,4 +204,5 @@ - [GuruScan verification IN TEST / paused](project_guruscan_in_test_paused.md) — multi-engine scanner verify on DESKTOP-MS42HNC paused 2026-06-22 (VM rebooted mid-Emsisoft run); HitmanPro done (36 removed), Emsisoft full-scan unverified; resume `guruscan-agent-test.sh DESKTOP-MS42HNC scan-one Emsisoft`; Defender RTP/Tamper still off on VM - [GuruRMM fleet dispatch-hang fix](project_gururmm_dispatch_hang_fix.md) — blocking send_to on a full bounded channel to one black-holed agent wedged ALL command dispatch; fixed with try_send (9dae20c, deployed); proper black-hole eviction still missing (was reverted in 80df458) — finish it if it recurs - [Windows won't-boot / offline DISM repair playbook](windows-offline-dism-repair-gotchas.md) — Automatic Repair loop = boot-critical fault (disk/registry/wedged update), NOT shell/appx store corruption (that's a symptom); `FaultyPackageInProgress` + 100s of Install/Uninstall-Pending packages = wedged CU -> RevertPendingActions or clean install. Offline DISM rejects `wim:` source (0x800f082e) -> MOUNT the wim, source `\Windows`. Ventoy breaks WIM mount (0xc1420134) -> use Rufus. 25H2(26200)=24H2(26100)+enablement, so match 26100 media. First hit: Four Paws AvImark #32447. +- [365 app suite — authoritative map + consent-drift fix](reference_365_app_suite.md) — full map in `.claude/skills/remediation-tool/references/app-suite.md`; per-tenant consent is NOT uniform (VWP had the app but no SharePoint role). Run `consent-audit.sh ` to detect gaps; fix via adminconsent URL or direct appRoleAssignment grant. - [Remediation-tool has full M365 access (incl. SharePoint)](reference_remediation_tool_365_access.md) — the app suite covers Graph/EXO/Defender/SharePoint; don't declare "no access" on an accessDenied. SharePoint app-only needs a CERT (secret = "Unsupported app only token"); use get-token.sh `sharepoint`/`sharepoint-admin` tiers + CSOM admin API (Graph /admin/sharepoint/settings scope not held). Full map: skill references/app-permissions-and-sharepoint.md. diff --git a/.claude/memory/reference_365_app_suite.md b/.claude/memory/reference_365_app_suite.md new file mode 100644 index 00000000..464e0e7a --- /dev/null +++ b/.claude/memory/reference_365_app_suite.md @@ -0,0 +1,32 @@ +--- +name: reference_365_app_suite +description: Authoritative map of the ComputerGuru M365 app suite (apps, App IDs, live-verified permissions per tier) and — the recurring failure — per-tenant consent is NOT uniform; how to audit + fix partial consent. +metadata: + type: reference +--- + +The ComputerGuru M365 app suite is fully documented in the remediation-tool skill: +`.claude/skills/remediation-tool/references/app-suite.md` (authoritative; live-verified +2026-07-02). Read it before concluding "the tool can't do X on tenant Y". + +**The recurring failure it fixes:** per-tenant consent is NOT uniform. A tenant can have an +app's service principal but only a PARTIAL/OLD permission grant. Example: VWP +(valleywideplastering.com, 5c53ae9f-…) had the Tenant Admin app but NO SharePoint +`Sites.FullControl.All` — SharePoint calls 401'd with a valid-looking token whose `roles` +claim was empty. The suite "having" a capability (baseline design) ≠ a given tenant having it +(actual consent). + +**Always AUDIT before giving up:** decode each tier's token `roles` on the target tenant and +compare to the baseline in app-suite.md. Empty roles on a correct `aud` = present-but-not-granted. + +**Fix partial consent — two methods:** +- A: re-consent the whole manifest — `https://login.microsoftonline.com//adminconsent?client_id=` (reliably grants Graph; the SharePoint app-only role often does NOT attach from consent — verify + use B for the leftover). +- B: grant the specific missing app role directly via `POST /servicePrincipals/{recipientSP}/appRoleAssignments` using a `tenant-admin` token (holds AppRoleAssignment.ReadWrite.All). This is how VWP's SharePoint role was granted 2026-07-02; propagates to a fresh token in seconds. Only to complete an intent the customer already consented to. +- EXO role gap: `assign-exchange-role.sh ` (audit fleet: `--all --verify`). + +Apps: Security Investigator bfbc12a4 (Graph read + EXO read), Exchange Operator b43e7342 +(EXO all-access + `exchange-op-graph` Graph Mail.ReadWrite), User Manager 64fac46b (Graph +user/group write), Tenant Admin 709e6eed (Graph high-priv + SharePoint Sites.FullControl.All +via CERT), Defender dbf8ad1a (MDE), Intune 46986910, Mailbox 1873b1b0 (ACG-internal only). +SharePoint app-only REQUIRES cert (not secret). See [[reference_remediation_tool_365_access]], +[[feedback_exchange_role_recurring_gap]], [[feedback_exchange_op_all_access]]. diff --git a/.claude/skills/remediation-tool/SKILL.md b/.claude/skills/remediation-tool/SKILL.md index 408c188c..fb4e3fb8 100644 --- a/.claude/skills/remediation-tool/SKILL.md +++ b/.claude/skills/remediation-tool/SKILL.md @@ -45,6 +45,14 @@ When triggered automatically (vs. via `/remediation-tool`), follow the same work - For Identity Protection checks: `IdentityRiskyUser.Read.All` is in the Security Investigator manifest AND the tenant has consented to that app. If 403, emit the per-app consent URL from `references/gotchas.md`. - For Defender checks: confirm tenant has Microsoft Defender for Endpoint (MDE) license before using `defender` tier — it returns AADSTS650052 otherwise. +## Consent audit (run FIRST on any tenant task) + +Per-tenant consent is NOT uniform — a tenant can have an app but only a partial/old grant +(the VWP "had the app but no SharePoint" failure). Before concluding "can't do X on tenant Y", +run `bash scripts/consent-audit.sh ` (or `--all`): it decodes every app's actual +grants, diffs vs the baseline, grades GREEN/AMBER/RED, and prints the exact fix per gap. +Full app + permission map: `references/app-suite.md`. + ## Conventions - **Target identifiers**: accept UPN, domain, or tenant GUID. Normalize to tenant GUID internally. diff --git a/.claude/skills/remediation-tool/references/app-suite.md b/.claude/skills/remediation-tool/references/app-suite.md new file mode 100644 index 00000000..0ca4ce82 --- /dev/null +++ b/.claude/skills/remediation-tool/references/app-suite.md @@ -0,0 +1,216 @@ +# ComputerGuru M365 App Suite — Authoritative Reference + +**This is the single source of truth for the ComputerGuru multi-tenant app suite: every app, +its App ID, its ACTUAL granted permissions, the get-token.sh tiers, and — the part that keeps +biting us — how per-tenant consent differs and how to audit + fix a tenant that only has +PARTIAL consent.** + +Live-verified 2026-07-02 against Birth Biologic (fully-consented reference tenant). Anything +here is re-checkable with the commands below; when a call fails, verify against reality before +concluding "the tool can't do it." + +--- + +## The apps + +All are **multi-tenant** app registrations owned in the ACG home tenant (azcomputerguru.com). +A customer tenant gets a **service principal** for each app when it consents. The App ID is the +same everywhere; the SP object id differs per tenant. + +| App (display name) | App ID | Auth | Vault file (in `msp-tools/`) | +|---|---|---|---| +| ComputerGuru - Security Investigator | `bfbc12a4-f0dd-4e12-b06d-997e7271e10c` | secret | `computerguru-security-investigator.sops.yaml` | +| ComputerGuru - Exchange Operator | `b43e7342-5b4b-492f-890f-bb5a4f7f40e9` | secret | `computerguru-exchange-operator.sops.yaml` | +| ComputerGuru - User Manager | `64fac46b-8b44-41ad-93ee-7da03927576c` | secret | `computerguru-user-manager.sops.yaml` | +| ComputerGuru - Tenant Admin | `709e6eed-0711-4875-9c44-2d3518c47063` | secret (Graph/EXO) **+ CERT (SharePoint)** | `computerguru-tenant-admin.sops.yaml` | +| ComputerGuru - Defender Add-on | `dbf8ad1a-54f4-4bb8-8a9e-ea5b9634635b` | secret | `computerguru-defender-addon.sops.yaml` | +| ComputerGuru - Intune Manager | `46986910-aa47-4e5e-b596-f65c6b485abb` | secret | `computerguru-intune-manager.sops.yaml` | +| ComputerGuru - Mailbox (ACG-INTERNAL ONLY) | `1873b1b0-3377-485c-a848-bae9b2f8f1f5` | CERT | `computerguru-mailbox.sops.yaml` | + +The **Mailbox** app is single-tenant (azcomputerguru.com only) — it is NOT part of the customer +suite; it 401s on any other tenant. + +--- + +## get-token.sh tiers (tier -> app -> resource) + +`bash scripts/get-token.sh ` -> a bearer token on stdout (cached 55 min in +`/tmp/remediation-tool//.jwt`; delete that file to force a fresh mint after a +consent change). + +| Tier | App | Audience / resource | Auth | +|---|---|---|---| +| `investigator` | Security Investigator | Graph (`https://graph.microsoft.com`) | secret | +| `investigator-exo` | Security Investigator | Exchange Online (`https://outlook.office365.com`) | secret | +| `exchange-op` | Exchange Operator | Exchange Online | secret | +| `exchange-op-graph` | Exchange Operator | Graph (added 2026-07-01) | secret | +| `user-manager` | User Manager | Graph | secret | +| `tenant-admin` | Tenant Admin | Graph | secret | +| `tenant-admin-onboard` | Tenant Admin | Graph (onboarding alias) | secret | +| `sharepoint` | Tenant Admin | SharePoint content (`.sharepoint.com`) | **CERT** | +| `sharepoint-admin` | Tenant Admin | SharePoint admin (`-admin.sharepoint.com`) | **CERT** | +| `defender` | Defender Add-on | Defender ATP (`https://api.securitycenter.microsoft.com`) | secret | +| `intune-manager` | Intune Manager | Graph | secret | +| `mailbox` | Mailbox (internal) | Graph | CERT | + +SharePoint host is auto-resolved via Graph `/sites/root`; override with +`SP_RESOURCE_ENV=-admin.sharepoint.com` (needed when the app lacks the Graph Sites scope +to self-resolve — e.g. mid-onboarding). + +--- + +## Live-verified permissions (application/app-only roles) + +Decoded from real tokens on Birth Biologic (full consent). These are the roles each app is +BUILT to hold. **A given tenant may hold a SUBSET** — see the audit section. + +### Graph roles + +| App / tier | Graph application roles | +|---|---| +| `investigator` | Application.Read.All, AuditLog.Read.All, BitlockerKey.Read.All, Directory.Read.All, IdentityRiskEvent.ReadWrite.All, IdentityRiskyAgent.ReadWrite.All, IdentityRiskyServicePrincipal.ReadWrite.All, IdentityRiskyUser.Read.All, IdentityRiskyUser.ReadWrite.All, Mail.Read, MailboxSettings.Read, Organization.Read.All, Policy.Read.All, **Sites.Read.All**, User.Read.All, UserAuthenticationMethod.Read.All | +| `exchange-op-graph` | **Mail.ReadWrite, MailboxSettings.ReadWrite**, Organization.Read.All, User.Read.All, User.RevokeSessions.All | +| `user-manager` | Device.ReadWrite.All, **Directory.ReadWrite.All, Group.ReadWrite.All, User.ReadWrite.All**, Organization.Read.All, User.RevokeSessions.All, UserAuthenticationMethod.ReadWrite.All | +| `tenant-admin` | **AppRoleAssignment.ReadWrite.All, Application.ReadWrite.All, RoleManagement.ReadWrite.Directory, Directory.ReadWrite.All, Policy.ReadWrite.ConditionalAccess, Sites.FullControl.All, Sites.ReadWrite.All**, User.ReadWrite.All, SecurityEvents.Read.All, Policy.Read.All, UserAuthenticationMethod.ReadWrite.All | + +### SharePoint Online roles (resource `00000003-0000-0ff1-ce00-000000000000`, CERT auth) + +| Tier | SharePoint application roles | +|---|---| +| `sharepoint` / `sharepoint-admin` | **Sites.FullControl.All** | + +### Exchange Online (app-only, EXO audience — role, not a Graph `roles` claim) + +EXO app-only access is via the **Exchange Administrator directory role** assigned to the SP PLUS +`full_access_as_app` + `Exchange.ManageAsApp` app permissions. It does NOT show in a Graph +`roles` decode — it's an EXO RBAC assignment. + +| Tier | EXO capability | +|---|---| +| `investigator-exo` | EXO **read** (Get-Mailbox, Get-InboxRule, Get-*) — needs Exchange Admin role on the Investigator SP | +| `exchange-op` | EXO **write / all-access** (Set-Mailbox, New/Remove-InboxRule, Add-MailboxPermission, InvokeCommand, move mail) — Exchange Admin role + full_access_as_app + Exchange.ManageAsApp on the Exchange Operator SP. This is the all-access mail tier; do NOT claim "no tier can write mail." | + +--- + +## THE THING THAT KEEPS BITING US: per-tenant consent is not uniform + +Every failure of the form "we can't do X on tenant Y" has been one of these, NOT a missing +capability: + +1. **The app is consented but with an OLD/PARTIAL permission set.** A tenant onboarded before a + scope was added to the app manifest keeps the old grant until re-consented. Example: **VWP + (Valley Wide Plastering, `5c53ae9f-7071-4248-b834-8685b646450f`) had the Tenant Admin app but + NO SharePoint `Sites.FullControl.All`** — every SharePoint call 401'd with a valid-looking + token (`aud` = SharePoint, `roles` = empty). Fixed 2026-07-02 (see below). +2. **The SharePoint secret-vs-cert gotcha.** SharePoint app-only REJECTS a client_secret token + ("Unsupported app only token"). `get-token.sh` forces cert for the `sharepoint*` tiers; if you + hand-roll, use the cert. +3. **Not consented at all** (AADSTS7000229 / AADSTS700016) — the tenant never onboarded the app. + Use the admin-consent URL. +4. **EXO role not assigned** — the app is consented but the SP lacks the Exchange Admin directory + role (EXO calls 403). Fix with `assign-exchange-role.sh`. + +**Discipline: before telling the user "the tool can't do it", AUDIT the tenant.** + +### Audit a tenant's real consent state + +**Just run the audit command** — it decodes every app's actual grants, diffs against this +baseline, grades GREEN/AMBER/RED, and prints the exact fix for each gap: + +```bash +bash scripts/consent-audit.sh # one tenant, full detail + fixes +bash scripts/consent-audit.sh --all # every tenant in references/tenants.md +``` + +Run it at the START of any tenant task (or on a schedule) so a partial-consent gap surfaces +up front, not as a 401 mid-task. It checks: investigator / exchange-op / user-manager / +tenant-admin Graph roles, the SharePoint app-only `Sites.FullControl.All` (cert), and the +EXO Exchange Administrator role — the full set that has ever caused a "can't do X" failure. + +Manual equivalent (what the script does), if you want to eyeball one tier: + +```bash +TEN= +for tier in investigator exchange-op-graph user-manager tenant-admin; do + echo "== $tier ==" + rm -f /tmp/remediation-tool/$TEN/$tier.jwt + T=$(bash scripts/get-token.sh $TEN $tier 2>/dev/null | tail -1) + python - "$T" <<'PY' +import sys,base64,json +c=json.loads(base64.urlsafe_b64decode((lambda p:p+'='*(-len(p)%4))(sys.argv[1].split('.')[1]))) +print(" roles:", ", ".join(sorted(c.get("roles",[]))) or "(NONE — not consented / partial)") +PY +done +# SharePoint (cert): roles should include Sites.FullControl.All +SP_RESOURCE_ENV=-admin.sharepoint.com bash scripts/get-token.sh $TEN sharepoint-admin \ + | tail -1 | cut -d. -f2 | ...decode roles... +``` + +An **empty `roles`** on a token whose `aud` is correct = the app is present but that permission +set was never granted on this tenant. That is the signal to fix consent, NOT to give up. + +### Fix partial consent — two methods + +**Method A — re-consent the whole app manifest (preferred, grants everything the app requests).** +A tenant Global Admin clicks: +``` +https://login.microsoftonline.com//adminconsent?client_id= +``` +Grants ALL of the app's requiredResourceAccess. NOTE (VWP lesson): this reliably grants the +**Graph** permissions but the **SharePoint** app-only role sometimes does NOT attach from the +consent flow — verify with the audit above and use Method B for the leftover. + +**Method B — grant a specific missing app role directly (no user interaction; needs the +tenant-admin token, which holds AppRoleAssignment.ReadWrite.All).** This is how VWP's missing +SharePoint role was fixed: +```bash +TEN=; TA=$(bash scripts/get-token.sh $TEN tenant-admin | tail -1); G=https://graph.microsoft.com/v1.0 +# the SP receiving the role (e.g. Tenant Admin app): +RECIP=$(curl -s -G "$G/servicePrincipals" --data-urlencode "\$filter=appId eq ''" -H "Authorization: Bearer $TA" | jq -r '.value[0].id') +# the resource SP that DEFINES the role (SharePoint = 00000003-0000-0ff1-ce00-...): +RES=$(curl -s -G "$G/servicePrincipals" --data-urlencode "\$filter=appId eq '00000003-0000-0ff1-ce00-000000000000'" --data-urlencode "\$select=id,appRoles" -H "Authorization: Bearer $TA") +RESID=$(echo "$RES" | jq -r '.value[0].id') +ROLE=$(echo "$RES" | jq -r '.value[0].appRoles[]|select(.value=="Sites.FullControl.All").id') +# grant it: +curl -s -X POST "$G/servicePrincipals/$RECIP/appRoleAssignments" -H "Authorization: Bearer $TA" \ + -H "Content-Type: application/json" \ + -d "{\"principalId\":\"$RECIP\",\"resourceId\":\"$RESID\",\"appRoleId\":\"$ROLE\"}" +``` +Only use Method B to complete an intent the customer already consented to (they clicked the +consent link / authorized the access). It propagates to a fresh token within seconds. + +**Fix a missing EXO role:** `bash scripts/assign-exchange-role.sh ` (idempotent; +`--all --verify` to audit the fleet). EXO propagation is 15-60 min. + +--- + +## Per-tenant onboarding status (living matrix — update as verified) + +Legend: F = full consent verified · P = partial (note the gap) · ? = unverified. + +| Tenant | Guid | investigator | exchange-op | user-manager | tenant-admin (Graph) | SharePoint | Notes | +|---|---|---|---|---|---|---|---| +| Birth Biologic | 19a568e8-… | F | F | F | F | F | reference/baseline | +| Valley Wide Plastering (VWP) | 5c53ae9f-… | ? | F (JIT-proven) | F | F (Graph Sites added 2026-07-02) | **F (fixed 2026-07-02 via Method B)** | was the "old app only" gap | +| Cascades of Tucson | (resolve) | ? | ? | ? | ? | ? | flagged "old app only" — audit | +| Dataforth | (resolve) | ? | ? | ? | ? | ? | flagged "old app only" — audit | + +Keep this matrix current — it is the answer to "is tenant X onboarded?" that we keep not having. + +--- + +## Gotchas index + +- **SharePoint app-only = CERT, not secret** (Unsupported app only token). `get-token.sh` + forces cert for `sharepoint*`. +- **Graph `GET /admin/sharepoint/settings` 403** — no app holds `SharePointTenantSettings.*`. + Read/write SP tenant settings via the CSOM/REST admin API (`sharepoint-admin`, Sites.FullControl.All). +- **adminconsent grants Graph but may skip the SharePoint app role** — verify + Method-B the gap. +- **JIT Privileged Auth Admin can't self-remove** — reset-password leaves standing PAA on the + Tenant Admin SP; a human GA removes it (or use PIM / a second principal). +- **EXO role propagation 15-60 min**; verify membership via + `roleManagement/directory/roleAssignments` (not the laggy directoryRoles/members list). +- **Mailbox app is ACG-internal only** — never expect it on a customer tenant. + +Adding a scope to an app manifest is a portal action (`patch-tenant-admin-manifest.sh` + +adminconsent); the multi-tenant app registrations themselves are managed manually, not via the tool. diff --git a/.claude/skills/remediation-tool/scripts/consent-audit.sh b/.claude/skills/remediation-tool/scripts/consent-audit.sh new file mode 100755 index 00000000..33cd5eb9 --- /dev/null +++ b/.claude/skills/remediation-tool/scripts/consent-audit.sh @@ -0,0 +1,184 @@ +#!/usr/bin/env bash +# consent-audit.sh — audit a tenant's ACTUAL consent state for the ComputerGuru M365 app +# suite against the documented baseline, report every gap with the exact fix command, and +# grade GREEN / AMBER / RED. Extends the assign-exchange-role.sh --verify pattern to ALL +# permissions (Graph scopes + SharePoint app-only role + EXO directory role), so a +# partially-consented tenant (the VWP-had-the-app-but-no-SharePoint failure) is caught up +# front instead of by a 401 mid-task. +# +# Usage: +# consent-audit.sh audit one tenant (full detail) +# consent-audit.sh --all [--verbose] audit every tenant in references/tenants.md +# consent-audit.sh --matrix one-line matrix row (for regenerating tenants) +# +# Exit: 0 GREEN, 1 AMBER (partial), 2 RED (an app not consented / token mint failed), 3 usage. +# +# Deps: get-token.sh (+ vault), jq, curl, a python. Read-only: mints tokens + reads Graph; +# never writes. The FIXES it prints are for a human/operator to run. + +set -u +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +SKILL_DIR="$(cd "$SCRIPT_DIR/.." && pwd)" +GT="$SCRIPT_DIR/get-token.sh" +GRAPH="https://graph.microsoft.com/v1.0" + +# --- app suite (appId + get-token tier + must-have roles) ------------------------------ +APP_INVESTIGATOR="bfbc12a4-f0dd-4e12-b06d-997e7271e10c" +APP_EXCHANGE_OP="b43e7342-5b4b-492f-890f-bb5a4f7f40e9" +APP_USER_MANAGER="64fac46b-8b44-41ad-93ee-7da03927576c" +APP_TENANT_ADMIN="709e6eed-0711-4875-9c44-2d3518c47063" +SP_RESOURCE_APPID="00000003-0000-0ff1-ce00-000000000000" # Office 365 SharePoint Online +SP_FULLCONTROL_ROLE="Sites.FullControl.All" +EXCHANGE_ADMIN_ROLE_TEMPLATE="29232cdf-9323-42fd-ade2-1d097af3e4de" # Exchange Administrator + +# must-have Graph roles per tier (curated: absence causes real task failures) +BASE_investigator="Directory.Read.All User.Read.All Sites.Read.All Mail.Read AuditLog.Read.All" +BASE_exchange_op_graph="Mail.ReadWrite MailboxSettings.ReadWrite" +BASE_user_manager="User.ReadWrite.All Group.ReadWrite.All Directory.ReadWrite.All" +BASE_tenant_admin="Application.ReadWrite.All AppRoleAssignment.ReadWrite.All Directory.ReadWrite.All RoleManagement.ReadWrite.Directory Sites.FullControl.All Sites.ReadWrite.All" + +VERBOSE=0; MATRIX=0; ALL=0; TARGET="" +for a in "$@"; do + case "$a" in + --all) ALL=1 ;; + --verbose|-v) VERBOSE=1 ;; + --matrix) MATRIX=1 ;; + -h|--help) sed -n '2,22p' "$0"; exit 3 ;; + *) TARGET="$a" ;; + esac +done + +jwt_roles() { # $1: token -> space-separated roles ("" if not a JWT) + python - "${1:-}" <<'PY' +import sys,base64,json +t=(sys.argv[1] if len(sys.argv)>1 else "").strip() +if t.count('.')!=2 or not t.startswith('ey'): + print(""); sys.exit(0) +try: + p=t.split('.')[1]; p+='='*(-len(p)%4) + print(" ".join(json.loads(base64.urlsafe_b64decode(p)).get("roles",[]))) +except Exception: + print("") +PY +} + +missing_of() { # $1=have (space list) $2=want (space list) -> missing items + local have=" $1 " out="" + for w in $2; do case "$have" in *" $w "*) ;; *) out="$out $w";; esac; done + echo "${out# }" +} + +# --- resolve tenant guid + primary domain ---------------------------------------------- +resolve_tenant() { # $1 = domain or guid ; sets TID + DOMAIN + SPPREFIX + local in="$1" + if [[ "$in" =~ ^[0-9a-fA-F-]{36}$ ]]; then TID="$in"; else + TID="$(curl -s "https://login.microsoftonline.com/$in/v2.0/.well-known/openid-configuration" | jq -r '.token_endpoint//""' | cut -d/ -f4)" + fi + [ -z "${TID:-}" ] && { echo "[ERROR] could not resolve tenant '$in'"; return 2; } + # onmicrosoft prefix for SharePoint host, via tenant-admin /domains + local ta; ta="$($GT "$TID" tenant-admin 2>/dev/null | tail -1)" + if [ -n "$ta" ] && printf '%s' "$ta" | grep -q '^ey'; then + local init; init="$(curl -s -G "$GRAPH/domains" --data-urlencode "\$select=id,isInitial" -H "Authorization: Bearer $ta" | jq -r '.value[]?|select(.isInitial==true)|.id')" + DOMAIN="$(curl -s -G "$GRAPH/domains" --data-urlencode "\$select=id,isDefault" -H "Authorization: Bearer $ta" | jq -r '.value[]?|select(.isDefault==true)|.id')" + SPPREFIX="${init%%.onmicrosoft.com}" + [ -z "$DOMAIN" ] && DOMAIN="$init" + else DOMAIN="$in"; SPPREFIX=""; fi + return 0 +} + +# --- one capability check -------------------------------------------------------------- +# emits a status line; appends fixes to $FIXES; bumps $WORST (0/1/2) +check_graph_tier() { # $1=label $2=tier $3=appid $4=want-roles + local label="$1" tier="$2" appid="$3" want="$4" + rm -f "/tmp/remediation-tool/$TID/$tier.jwt" 2>/dev/null + local tok; tok="$($GT "$TID" "$tier" 2>/dev/null | tail -1)" + local roles; roles="$(jwt_roles "$tok")" + if [ -z "$roles" ] && ! printf '%s' "$tok" | grep -q '^ey'; then + printf " [RED] %-16s app NOT consented (token mint failed)\n" "$label" + FIXES="$FIXES\n # $label: consent the app\n https://login.microsoftonline.com/$TID/adminconsent?client_id=$appid" + WORST=2; return + fi + local miss; miss="$(missing_of "$roles" "$want")" + if [ -z "$miss" ]; then + printf " [OK] %-16s all baseline roles present\n" "$label" + [ "$VERBOSE" = 1 ] && echo " have: $roles" + else + printf " [AMBER] %-16s PARTIAL — missing: %s\n" "$label" "$miss" + FIXES="$FIXES\n # $label: re-consent grants the full manifest\n https://login.microsoftonline.com/$TID/adminconsent?client_id=$appid" + [ "$WORST" -lt 1 ] && WORST=1 + fi +} + +check_sharepoint() { + local host="${SPPREFIX:+$SPPREFIX-admin.sharepoint.com}" + rm -f "/tmp/remediation-tool/$TID/sharepoint-admin.jwt" 2>/dev/null + local tok; tok="$(SP_RESOURCE_ENV="$host" $GT "$TID" sharepoint-admin 2>/dev/null | tail -1)" + local roles; roles="$(jwt_roles "$tok")" + case " $roles " in + *" $SP_FULLCONTROL_ROLE "*) + printf " [OK] %-16s Sites.FullControl.All present (cert)\n" "SharePoint" ;; + *) + printf " [AMBER] %-16s missing SharePoint app-only Sites.FullControl.All\n" "SharePoint" + FIXES="$FIXES\n # SharePoint: adminconsent often does NOT attach this — grant the app-only role directly (Method B, app-suite.md):\n # TA=\$($GT $TID tenant-admin|tail -1); recip=Tenant-Admin SP id; res=SharePoint SP id; role=Sites.FullControl.All id\n # POST /servicePrincipals/{recip}/appRoleAssignments {principalId,resourceId,appRoleId}\n https://login.microsoftonline.com/$TID/adminconsent?client_id=$APP_TENANT_ADMIN # try re-consent first" + [ "$WORST" -lt 1 ] && WORST=1 ;; + esac +} + +check_exo_role() { # Exchange Admin directory role on the Exchange Operator SP + local ta; ta="$($GT "$TID" tenant-admin 2>/dev/null | tail -1)" + printf '%s' "$ta" | grep -q '^ey' || { printf " [SKIP] %-16s (need tenant-admin to check)\n" "EXO role"; return; } + local spid; spid="$(curl -s -G "$GRAPH/servicePrincipals" --data-urlencode "\$filter=appId eq '$APP_EXCHANGE_OP'" --data-urlencode "\$select=id" -H "Authorization: Bearer $ta" | jq -r '.value[0].id//""')" + [ -z "$spid" ] && { printf " [RED] %-16s Exchange Operator app NOT consented\n" "EXO role" + FIXES="$FIXES\n # Exchange Operator: consent the app\n https://login.microsoftonline.com/$TID/adminconsent?client_id=$APP_EXCHANGE_OP"; WORST=2; return; } + local has; has="$(curl -s -G "$GRAPH/roleManagement/directory/roleAssignments" --data-urlencode "\$filter=principalId eq '$spid'" -H "Authorization: Bearer $ta" | jq -r --arg r "$EXCHANGE_ADMIN_ROLE_TEMPLATE" '[.value[]?|select(.roleDefinitionId==$r)]|length')" + if [ "${has:-0}" -ge 1 ] 2>/dev/null; then + printf " [OK] %-16s Exchange Administrator role assigned\n" "EXO role" + else + printf " [AMBER] %-16s Exchange Operator SP missing Exchange Administrator role\n" "EXO role" + FIXES="$FIXES\n # EXO: assign the Exchange Administrator role\n bash $SCRIPT_DIR/assign-exchange-role.sh $DOMAIN" + [ "$WORST" -lt 1 ] && WORST=1 + fi +} + +audit_one() { # $1 = domain|guid ; returns worst grade (0/1/2) + TID=""; DOMAIN=""; SPPREFIX=""; FIXES=""; WORST=0 + resolve_tenant "$1" || return 2 + if [ "$MATRIX" = 1 ]; then + # compact single-line probe for matrix use + : + fi + echo "============================================================" + echo " Consent audit — ${DOMAIN:-$1} ($TID)" + echo "============================================================" + check_graph_tier "investigator" investigator "$APP_INVESTIGATOR" "$BASE_investigator" + check_graph_tier "exchange-op" exchange-op-graph "$APP_EXCHANGE_OP" "$BASE_exchange_op_graph" + check_graph_tier "user-manager" user-manager "$APP_USER_MANAGER" "$BASE_user_manager" + check_graph_tier "tenant-admin" tenant-admin "$APP_TENANT_ADMIN" "$BASE_tenant_admin" + check_sharepoint + check_exo_role + local grade="GREEN"; [ "$WORST" = 1 ] && grade="AMBER"; [ "$WORST" = 2 ] && grade="RED" + echo "------------------------------------------------------------" + echo " GRADE: $grade" + if [ "$WORST" -gt 0 ]; then echo " FIXES:"; printf '%b\n' "$FIXES"; fi + echo "" + return "$WORST" +} + +# --- fleet audit ----------------------------------------------------------------------- +# --all iterates every tenant in references/tenants.md, printing a per-tenant block + +# fixes, and exits with the WORST grade seen. A compact matrix mode is a follow-up. +if [ "$ALL" = 1 ]; then + tfile="$SKILL_DIR/references/tenants.md" + [ -f "$tfile" ] || { echo "[ERROR] tenants.md not found"; exit 3; } + worstall=0 + while IFS='|' read -r _ disp dom guid rest; do + guid="$(echo "${guid:-}" | tr -d ' ')" + [[ "$guid" =~ ^[0-9a-fA-F-]{36}$ ]] || continue + audit_one "$guid"; rc=$? + [ "$rc" -gt "$worstall" ] && worstall=$rc + done < <(grep -E '^\|' "$tfile" | grep -vE 'Display Name|^\|\s*-') + exit "$worstall" +fi + +[ -z "$TARGET" ] && { echo "[ERROR] give a tenant (domain|guid) or --all"; sed -n '9,13p' "$0"; exit 3; } +audit_one "$TARGET"; exit $? diff --git a/errorlog.md b/errorlog.md index 7577e943..ef721c2f 100644 --- a/errorlog.md +++ b/errorlog.md @@ -25,6 +25,12 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure · 2026-07-02 | Howard-Home | unifi/site-manager-api | [friction] vault infrastructure/unifi-site-manager-api key returns 401 (stale/rotated); the WORKING cloud key is services/unifi-site-manager (X-API-KEY vs api.ui.com) [ctx: ref=uos-server wiki; use services/unifi-site-manager] +2026-07-02 | GURU-5070 | remediation-tool/consent-drift | [correction] assumed VWP had SharePoint access because the suite 'has' it; VWP had the Tenant Admin app but only PARTIAL consent (Graph Sites but not the SharePoint app-only role) -> SP calls 401 with empty roles. Fix: audit per-tenant token roles, grant missing app role via appRoleAssignment (Method B). Per-tenant consent is NOT uniform. [ctx: ref=app-suite.md tenant=VWP] + +2026-07-02 | GURU-5070 | rmm/long-install-reaper | [friction] long download+install (131MB Falcon) exceeds the RMM command timeout on a slow-egress box -> command shows failed/'Command timeout'/'Access is denied' but the install COMPLETES in background (service came up Running). Verify service state after, don't trust the failed status for fire-and-forget installs. [ctx: ref=reference_gururmm_command_timeout_seconds host=ACG-DC16] + +2026-07-02 | GURU-5070 | ps-encoded/server2016 | [friction] ps-encoded.sh rmm (shell->cmd.exe->powershell -EncodedCommand) returns 'Access is denied' with no stdout on Windows Server 2016 (DC16); plain command_type=powershell works. Fall back to direct powershell dispatch on Server 2016. [ctx: ref=ps-encoded.sh host=ACG-DC16 os=server2016] + 2026-07-02 | GURU-BEAST-ROG | self-check/registry-trim | [friction] trimmed skill registry locally while GURU-5070 shipped the same trim upstream; auto-sync merge raced my uncommitted edits (transient UU state, stale 15777 reading mid-merge); fix: check coord / claim a lock before fleet-wide harness edits [ctx: ref=coord-locks] 2026-07-02 | Howard-Home | rmm/user-manager | [correction] reset Shelby.Trozzi domain password with raw Set-ADAccountPassword via /rmm; memory reference_gururmm_user_manager says use the built-in GuruRMM User Manager (reset_password action, is_dc) instead. [ctx: ref=reference_gururmm_user_manager]