remediation-tool: document the 365 app suite + build consent-audit

Root-caused the recurring '365 suite isn't documented' pain: the apps are fine (tiered by
privilege) but per-tenant consent is NOT uniform and there was no way to see a tenant's
actual grant state. VWP had the Tenant Admin app but no SharePoint app-only role -> silent
401s until this session.

- references/app-suite.md: authoritative, live-verified map of every app, App ID, and
  actually-granted permission per tier; the consent-drift problem + both fix methods
  (adminconsent URL, direct appRoleAssignment grant).
- scripts/consent-audit.sh: audits a tenant (or --all) vs the baseline, grades
  GREEN/AMBER/RED, prints the exact fix per gap. Extends the assign-exchange-role --verify
  pattern to Graph scopes + SharePoint role + EXO role. Verified: BirthBio GREEN, VWP/Cascades
  AMBER (caught real drift - both missing grants).
- SKILL.md: run consent-audit FIRST on any tenant task. Memory + errorlog correction.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-02 15:13:21 -07:00
parent 42da3cfcca
commit 8152476ee4
6 changed files with 447 additions and 0 deletions

View File

@@ -204,4 +204,5 @@
- [GuruScan verification IN TEST / paused](project_guruscan_in_test_paused.md) — multi-engine scanner verify on DESKTOP-MS42HNC paused 2026-06-22 (VM rebooted mid-Emsisoft run); HitmanPro done (36 removed), Emsisoft full-scan unverified; resume `guruscan-agent-test.sh DESKTOP-MS42HNC scan-one Emsisoft`; Defender RTP/Tamper still off on VM - [GuruScan verification IN TEST / paused](project_guruscan_in_test_paused.md) — multi-engine scanner verify on DESKTOP-MS42HNC paused 2026-06-22 (VM rebooted mid-Emsisoft run); HitmanPro done (36 removed), Emsisoft full-scan unverified; resume `guruscan-agent-test.sh DESKTOP-MS42HNC scan-one Emsisoft`; Defender RTP/Tamper still off on VM
- [GuruRMM fleet dispatch-hang fix](project_gururmm_dispatch_hang_fix.md) — blocking send_to on a full bounded channel to one black-holed agent wedged ALL command dispatch; fixed with try_send (9dae20c, deployed); proper black-hole eviction still missing (was reverted in 80df458) — finish it if it recurs - [GuruRMM fleet dispatch-hang fix](project_gururmm_dispatch_hang_fix.md) — blocking send_to on a full bounded channel to one black-holed agent wedged ALL command dispatch; fixed with try_send (9dae20c, deployed); proper black-hole eviction still missing (was reverted in 80df458) — finish it if it recurs
- [Windows won't-boot / offline DISM repair playbook](windows-offline-dism-repair-gotchas.md) — Automatic Repair loop = boot-critical fault (disk/registry/wedged update), NOT shell/appx store corruption (that's a symptom); `FaultyPackageInProgress` + 100s of Install/Uninstall-Pending packages = wedged CU -> RevertPendingActions or clean install. Offline DISM rejects `wim:` source (0x800f082e) -> MOUNT the wim, source `\Windows`. Ventoy breaks WIM mount (0xc1420134) -> use Rufus. 25H2(26200)=24H2(26100)+enablement, so match 26100 media. First hit: Four Paws AvImark #32447. - [Windows won't-boot / offline DISM repair playbook](windows-offline-dism-repair-gotchas.md) — Automatic Repair loop = boot-critical fault (disk/registry/wedged update), NOT shell/appx store corruption (that's a symptom); `FaultyPackageInProgress` + 100s of Install/Uninstall-Pending packages = wedged CU -> RevertPendingActions or clean install. Offline DISM rejects `wim:` source (0x800f082e) -> MOUNT the wim, source `\Windows`. Ventoy breaks WIM mount (0xc1420134) -> use Rufus. 25H2(26200)=24H2(26100)+enablement, so match 26100 media. First hit: Four Paws AvImark #32447.
- [365 app suite — authoritative map + consent-drift fix](reference_365_app_suite.md) — full map in `.claude/skills/remediation-tool/references/app-suite.md`; per-tenant consent is NOT uniform (VWP had the app but no SharePoint role). Run `consent-audit.sh <tenant|--all>` to detect gaps; fix via adminconsent URL or direct appRoleAssignment grant.
- [Remediation-tool has full M365 access (incl. SharePoint)](reference_remediation_tool_365_access.md) — the app suite covers Graph/EXO/Defender/SharePoint; don't declare "no access" on an accessDenied. SharePoint app-only needs a CERT (secret = "Unsupported app only token"); use get-token.sh `sharepoint`/`sharepoint-admin` tiers + CSOM admin API (Graph /admin/sharepoint/settings scope not held). Full map: skill references/app-permissions-and-sharepoint.md. - [Remediation-tool has full M365 access (incl. SharePoint)](reference_remediation_tool_365_access.md) — the app suite covers Graph/EXO/Defender/SharePoint; don't declare "no access" on an accessDenied. SharePoint app-only needs a CERT (secret = "Unsupported app only token"); use get-token.sh `sharepoint`/`sharepoint-admin` tiers + CSOM admin API (Graph /admin/sharepoint/settings scope not held). Full map: skill references/app-permissions-and-sharepoint.md.

View File

@@ -0,0 +1,32 @@
---
name: reference_365_app_suite
description: Authoritative map of the ComputerGuru M365 app suite (apps, App IDs, live-verified permissions per tier) and — the recurring failure — per-tenant consent is NOT uniform; how to audit + fix partial consent.
metadata:
type: reference
---
The ComputerGuru M365 app suite is fully documented in the remediation-tool skill:
`.claude/skills/remediation-tool/references/app-suite.md` (authoritative; live-verified
2026-07-02). Read it before concluding "the tool can't do X on tenant Y".
**The recurring failure it fixes:** per-tenant consent is NOT uniform. A tenant can have an
app's service principal but only a PARTIAL/OLD permission grant. Example: VWP
(valleywideplastering.com, 5c53ae9f-…) had the Tenant Admin app but NO SharePoint
`Sites.FullControl.All` — SharePoint calls 401'd with a valid-looking token whose `roles`
claim was empty. The suite "having" a capability (baseline design) ≠ a given tenant having it
(actual consent).
**Always AUDIT before giving up:** decode each tier's token `roles` on the target tenant and
compare to the baseline in app-suite.md. Empty roles on a correct `aud` = present-but-not-granted.
**Fix partial consent — two methods:**
- A: re-consent the whole manifest — `https://login.microsoftonline.com/<tenant>/adminconsent?client_id=<app-id>` (reliably grants Graph; the SharePoint app-only role often does NOT attach from consent — verify + use B for the leftover).
- B: grant the specific missing app role directly via `POST /servicePrincipals/{recipientSP}/appRoleAssignments` using a `tenant-admin` token (holds AppRoleAssignment.ReadWrite.All). This is how VWP's SharePoint role was granted 2026-07-02; propagates to a fresh token in seconds. Only to complete an intent the customer already consented to.
- EXO role gap: `assign-exchange-role.sh <domain>` (audit fleet: `--all --verify`).
Apps: Security Investigator bfbc12a4 (Graph read + EXO read), Exchange Operator b43e7342
(EXO all-access + `exchange-op-graph` Graph Mail.ReadWrite), User Manager 64fac46b (Graph
user/group write), Tenant Admin 709e6eed (Graph high-priv + SharePoint Sites.FullControl.All
via CERT), Defender dbf8ad1a (MDE), Intune 46986910, Mailbox 1873b1b0 (ACG-internal only).
SharePoint app-only REQUIRES cert (not secret). See [[reference_remediation_tool_365_access]],
[[feedback_exchange_role_recurring_gap]], [[feedback_exchange_op_all_access]].

View File

@@ -45,6 +45,14 @@ When triggered automatically (vs. via `/remediation-tool`), follow the same work
- For Identity Protection checks: `IdentityRiskyUser.Read.All` is in the Security Investigator manifest AND the tenant has consented to that app. If 403, emit the per-app consent URL from `references/gotchas.md`. - For Identity Protection checks: `IdentityRiskyUser.Read.All` is in the Security Investigator manifest AND the tenant has consented to that app. If 403, emit the per-app consent URL from `references/gotchas.md`.
- For Defender checks: confirm tenant has Microsoft Defender for Endpoint (MDE) license before using `defender` tier — it returns AADSTS650052 otherwise. - For Defender checks: confirm tenant has Microsoft Defender for Endpoint (MDE) license before using `defender` tier — it returns AADSTS650052 otherwise.
## Consent audit (run FIRST on any tenant task)
Per-tenant consent is NOT uniform — a tenant can have an app but only a partial/old grant
(the VWP "had the app but no SharePoint" failure). Before concluding "can't do X on tenant Y",
run `bash scripts/consent-audit.sh <domain|guid>` (or `--all`): it decodes every app's actual
grants, diffs vs the baseline, grades GREEN/AMBER/RED, and prints the exact fix per gap.
Full app + permission map: `references/app-suite.md`.
## Conventions ## Conventions
- **Target identifiers**: accept UPN, domain, or tenant GUID. Normalize to tenant GUID internally. - **Target identifiers**: accept UPN, domain, or tenant GUID. Normalize to tenant GUID internally.

View File

@@ -0,0 +1,216 @@
# ComputerGuru M365 App Suite — Authoritative Reference
**This is the single source of truth for the ComputerGuru multi-tenant app suite: every app,
its App ID, its ACTUAL granted permissions, the get-token.sh tiers, and — the part that keeps
biting us — how per-tenant consent differs and how to audit + fix a tenant that only has
PARTIAL consent.**
Live-verified 2026-07-02 against Birth Biologic (fully-consented reference tenant). Anything
here is re-checkable with the commands below; when a call fails, verify against reality before
concluding "the tool can't do it."
---
## The apps
All are **multi-tenant** app registrations owned in the ACG home tenant (azcomputerguru.com).
A customer tenant gets a **service principal** for each app when it consents. The App ID is the
same everywhere; the SP object id differs per tenant.
| App (display name) | App ID | Auth | Vault file (in `msp-tools/`) |
|---|---|---|---|
| ComputerGuru - Security Investigator | `bfbc12a4-f0dd-4e12-b06d-997e7271e10c` | secret | `computerguru-security-investigator.sops.yaml` |
| ComputerGuru - Exchange Operator | `b43e7342-5b4b-492f-890f-bb5a4f7f40e9` | secret | `computerguru-exchange-operator.sops.yaml` |
| ComputerGuru - User Manager | `64fac46b-8b44-41ad-93ee-7da03927576c` | secret | `computerguru-user-manager.sops.yaml` |
| ComputerGuru - Tenant Admin | `709e6eed-0711-4875-9c44-2d3518c47063` | secret (Graph/EXO) **+ CERT (SharePoint)** | `computerguru-tenant-admin.sops.yaml` |
| ComputerGuru - Defender Add-on | `dbf8ad1a-54f4-4bb8-8a9e-ea5b9634635b` | secret | `computerguru-defender-addon.sops.yaml` |
| ComputerGuru - Intune Manager | `46986910-aa47-4e5e-b596-f65c6b485abb` | secret | `computerguru-intune-manager.sops.yaml` |
| ComputerGuru - Mailbox (ACG-INTERNAL ONLY) | `1873b1b0-3377-485c-a848-bae9b2f8f1f5` | CERT | `computerguru-mailbox.sops.yaml` |
The **Mailbox** app is single-tenant (azcomputerguru.com only) — it is NOT part of the customer
suite; it 401s on any other tenant.
---
## get-token.sh tiers (tier -> app -> resource)
`bash scripts/get-token.sh <tenant> <tier>` -> a bearer token on stdout (cached 55 min in
`/tmp/remediation-tool/<tenant>/<tier>.jwt`; delete that file to force a fresh mint after a
consent change).
| Tier | App | Audience / resource | Auth |
|---|---|---|---|
| `investigator` | Security Investigator | Graph (`https://graph.microsoft.com`) | secret |
| `investigator-exo` | Security Investigator | Exchange Online (`https://outlook.office365.com`) | secret |
| `exchange-op` | Exchange Operator | Exchange Online | secret |
| `exchange-op-graph` | Exchange Operator | Graph (added 2026-07-01) | secret |
| `user-manager` | User Manager | Graph | secret |
| `tenant-admin` | Tenant Admin | Graph | secret |
| `tenant-admin-onboard` | Tenant Admin | Graph (onboarding alias) | secret |
| `sharepoint` | Tenant Admin | SharePoint content (`<name>.sharepoint.com`) | **CERT** |
| `sharepoint-admin` | Tenant Admin | SharePoint admin (`<name>-admin.sharepoint.com`) | **CERT** |
| `defender` | Defender Add-on | Defender ATP (`https://api.securitycenter.microsoft.com`) | secret |
| `intune-manager` | Intune Manager | Graph | secret |
| `mailbox` | Mailbox (internal) | Graph | CERT |
SharePoint host is auto-resolved via Graph `/sites/root`; override with
`SP_RESOURCE_ENV=<name>-admin.sharepoint.com` (needed when the app lacks the Graph Sites scope
to self-resolve — e.g. mid-onboarding).
---
## Live-verified permissions (application/app-only roles)
Decoded from real tokens on Birth Biologic (full consent). These are the roles each app is
BUILT to hold. **A given tenant may hold a SUBSET** — see the audit section.
### Graph roles
| App / tier | Graph application roles |
|---|---|
| `investigator` | Application.Read.All, AuditLog.Read.All, BitlockerKey.Read.All, Directory.Read.All, IdentityRiskEvent.ReadWrite.All, IdentityRiskyAgent.ReadWrite.All, IdentityRiskyServicePrincipal.ReadWrite.All, IdentityRiskyUser.Read.All, IdentityRiskyUser.ReadWrite.All, Mail.Read, MailboxSettings.Read, Organization.Read.All, Policy.Read.All, **Sites.Read.All**, User.Read.All, UserAuthenticationMethod.Read.All |
| `exchange-op-graph` | **Mail.ReadWrite, MailboxSettings.ReadWrite**, Organization.Read.All, User.Read.All, User.RevokeSessions.All |
| `user-manager` | Device.ReadWrite.All, **Directory.ReadWrite.All, Group.ReadWrite.All, User.ReadWrite.All**, Organization.Read.All, User.RevokeSessions.All, UserAuthenticationMethod.ReadWrite.All |
| `tenant-admin` | **AppRoleAssignment.ReadWrite.All, Application.ReadWrite.All, RoleManagement.ReadWrite.Directory, Directory.ReadWrite.All, Policy.ReadWrite.ConditionalAccess, Sites.FullControl.All, Sites.ReadWrite.All**, User.ReadWrite.All, SecurityEvents.Read.All, Policy.Read.All, UserAuthenticationMethod.ReadWrite.All |
### SharePoint Online roles (resource `00000003-0000-0ff1-ce00-000000000000`, CERT auth)
| Tier | SharePoint application roles |
|---|---|
| `sharepoint` / `sharepoint-admin` | **Sites.FullControl.All** |
### Exchange Online (app-only, EXO audience — role, not a Graph `roles` claim)
EXO app-only access is via the **Exchange Administrator directory role** assigned to the SP PLUS
`full_access_as_app` + `Exchange.ManageAsApp` app permissions. It does NOT show in a Graph
`roles` decode — it's an EXO RBAC assignment.
| Tier | EXO capability |
|---|---|
| `investigator-exo` | EXO **read** (Get-Mailbox, Get-InboxRule, Get-*) — needs Exchange Admin role on the Investigator SP |
| `exchange-op` | EXO **write / all-access** (Set-Mailbox, New/Remove-InboxRule, Add-MailboxPermission, InvokeCommand, move mail) — Exchange Admin role + full_access_as_app + Exchange.ManageAsApp on the Exchange Operator SP. This is the all-access mail tier; do NOT claim "no tier can write mail." |
---
## THE THING THAT KEEPS BITING US: per-tenant consent is not uniform
Every failure of the form "we can't do X on tenant Y" has been one of these, NOT a missing
capability:
1. **The app is consented but with an OLD/PARTIAL permission set.** A tenant onboarded before a
scope was added to the app manifest keeps the old grant until re-consented. Example: **VWP
(Valley Wide Plastering, `5c53ae9f-7071-4248-b834-8685b646450f`) had the Tenant Admin app but
NO SharePoint `Sites.FullControl.All`** — every SharePoint call 401'd with a valid-looking
token (`aud` = SharePoint, `roles` = empty). Fixed 2026-07-02 (see below).
2. **The SharePoint secret-vs-cert gotcha.** SharePoint app-only REJECTS a client_secret token
("Unsupported app only token"). `get-token.sh` forces cert for the `sharepoint*` tiers; if you
hand-roll, use the cert.
3. **Not consented at all** (AADSTS7000229 / AADSTS700016) — the tenant never onboarded the app.
Use the admin-consent URL.
4. **EXO role not assigned** — the app is consented but the SP lacks the Exchange Admin directory
role (EXO calls 403). Fix with `assign-exchange-role.sh`.
**Discipline: before telling the user "the tool can't do it", AUDIT the tenant.**
### Audit a tenant's real consent state
**Just run the audit command** — it decodes every app's actual grants, diffs against this
baseline, grades GREEN/AMBER/RED, and prints the exact fix for each gap:
```bash
bash scripts/consent-audit.sh <domain|tenant-guid> # one tenant, full detail + fixes
bash scripts/consent-audit.sh --all # every tenant in references/tenants.md
```
Run it at the START of any tenant task (or on a schedule) so a partial-consent gap surfaces
up front, not as a 401 mid-task. It checks: investigator / exchange-op / user-manager /
tenant-admin Graph roles, the SharePoint app-only `Sites.FullControl.All` (cert), and the
EXO Exchange Administrator role — the full set that has ever caused a "can't do X" failure.
Manual equivalent (what the script does), if you want to eyeball one tier:
```bash
TEN=<tenant-guid>
for tier in investigator exchange-op-graph user-manager tenant-admin; do
echo "== $tier =="
rm -f /tmp/remediation-tool/$TEN/$tier.jwt
T=$(bash scripts/get-token.sh $TEN $tier 2>/dev/null | tail -1)
python - "$T" <<'PY'
import sys,base64,json
c=json.loads(base64.urlsafe_b64decode((lambda p:p+'='*(-len(p)%4))(sys.argv[1].split('.')[1])))
print(" roles:", ", ".join(sorted(c.get("roles",[]))) or "(NONE — not consented / partial)")
PY
done
# SharePoint (cert): roles should include Sites.FullControl.All
SP_RESOURCE_ENV=<name>-admin.sharepoint.com bash scripts/get-token.sh $TEN sharepoint-admin \
| tail -1 | cut -d. -f2 | ...decode roles...
```
An **empty `roles`** on a token whose `aud` is correct = the app is present but that permission
set was never granted on this tenant. That is the signal to fix consent, NOT to give up.
### Fix partial consent — two methods
**Method A — re-consent the whole app manifest (preferred, grants everything the app requests).**
A tenant Global Admin clicks:
```
https://login.microsoftonline.com/<tenant>/adminconsent?client_id=<app-id>
```
Grants ALL of the app's requiredResourceAccess. NOTE (VWP lesson): this reliably grants the
**Graph** permissions but the **SharePoint** app-only role sometimes does NOT attach from the
consent flow — verify with the audit above and use Method B for the leftover.
**Method B — grant a specific missing app role directly (no user interaction; needs the
tenant-admin token, which holds AppRoleAssignment.ReadWrite.All).** This is how VWP's missing
SharePoint role was fixed:
```bash
TEN=<tenant>; TA=$(bash scripts/get-token.sh $TEN tenant-admin | tail -1); G=https://graph.microsoft.com/v1.0
# the SP receiving the role (e.g. Tenant Admin app):
RECIP=$(curl -s -G "$G/servicePrincipals" --data-urlencode "\$filter=appId eq '<app-id>'" -H "Authorization: Bearer $TA" | jq -r '.value[0].id')
# the resource SP that DEFINES the role (SharePoint = 00000003-0000-0ff1-ce00-...):
RES=$(curl -s -G "$G/servicePrincipals" --data-urlencode "\$filter=appId eq '00000003-0000-0ff1-ce00-000000000000'" --data-urlencode "\$select=id,appRoles" -H "Authorization: Bearer $TA")
RESID=$(echo "$RES" | jq -r '.value[0].id')
ROLE=$(echo "$RES" | jq -r '.value[0].appRoles[]|select(.value=="Sites.FullControl.All").id')
# grant it:
curl -s -X POST "$G/servicePrincipals/$RECIP/appRoleAssignments" -H "Authorization: Bearer $TA" \
-H "Content-Type: application/json" \
-d "{\"principalId\":\"$RECIP\",\"resourceId\":\"$RESID\",\"appRoleId\":\"$ROLE\"}"
```
Only use Method B to complete an intent the customer already consented to (they clicked the
consent link / authorized the access). It propagates to a fresh token within seconds.
**Fix a missing EXO role:** `bash scripts/assign-exchange-role.sh <domain>` (idempotent;
`--all --verify` to audit the fleet). EXO propagation is 15-60 min.
---
## Per-tenant onboarding status (living matrix — update as verified)
Legend: F = full consent verified · P = partial (note the gap) · ? = unverified.
| Tenant | Guid | investigator | exchange-op | user-manager | tenant-admin (Graph) | SharePoint | Notes |
|---|---|---|---|---|---|---|---|
| Birth Biologic | 19a568e8-… | F | F | F | F | F | reference/baseline |
| Valley Wide Plastering (VWP) | 5c53ae9f-… | ? | F (JIT-proven) | F | F (Graph Sites added 2026-07-02) | **F (fixed 2026-07-02 via Method B)** | was the "old app only" gap |
| Cascades of Tucson | (resolve) | ? | ? | ? | ? | ? | flagged "old app only" — audit |
| Dataforth | (resolve) | ? | ? | ? | ? | ? | flagged "old app only" — audit |
Keep this matrix current — it is the answer to "is tenant X onboarded?" that we keep not having.
---
## Gotchas index
- **SharePoint app-only = CERT, not secret** (Unsupported app only token). `get-token.sh`
forces cert for `sharepoint*`.
- **Graph `GET /admin/sharepoint/settings` 403** — no app holds `SharePointTenantSettings.*`.
Read/write SP tenant settings via the CSOM/REST admin API (`sharepoint-admin`, Sites.FullControl.All).
- **adminconsent grants Graph but may skip the SharePoint app role** — verify + Method-B the gap.
- **JIT Privileged Auth Admin can't self-remove** — reset-password leaves standing PAA on the
Tenant Admin SP; a human GA removes it (or use PIM / a second principal).
- **EXO role propagation 15-60 min**; verify membership via
`roleManagement/directory/roleAssignments` (not the laggy directoryRoles/members list).
- **Mailbox app is ACG-internal only** — never expect it on a customer tenant.
Adding a scope to an app manifest is a portal action (`patch-tenant-admin-manifest.sh` +
adminconsent); the multi-tenant app registrations themselves are managed manually, not via the tool.

View File

@@ -0,0 +1,184 @@
#!/usr/bin/env bash
# consent-audit.sh — audit a tenant's ACTUAL consent state for the ComputerGuru M365 app
# suite against the documented baseline, report every gap with the exact fix command, and
# grade GREEN / AMBER / RED. Extends the assign-exchange-role.sh --verify pattern to ALL
# permissions (Graph scopes + SharePoint app-only role + EXO directory role), so a
# partially-consented tenant (the VWP-had-the-app-but-no-SharePoint failure) is caught up
# front instead of by a 401 mid-task.
#
# Usage:
# consent-audit.sh <domain|tenant-guid> audit one tenant (full detail)
# consent-audit.sh --all [--verbose] audit every tenant in references/tenants.md
# consent-audit.sh <t> --matrix one-line matrix row (for regenerating tenants)
#
# Exit: 0 GREEN, 1 AMBER (partial), 2 RED (an app not consented / token mint failed), 3 usage.
#
# Deps: get-token.sh (+ vault), jq, curl, a python. Read-only: mints tokens + reads Graph;
# never writes. The FIXES it prints are for a human/operator to run.
set -u
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SKILL_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
GT="$SCRIPT_DIR/get-token.sh"
GRAPH="https://graph.microsoft.com/v1.0"
# --- app suite (appId + get-token tier + must-have roles) ------------------------------
APP_INVESTIGATOR="bfbc12a4-f0dd-4e12-b06d-997e7271e10c"
APP_EXCHANGE_OP="b43e7342-5b4b-492f-890f-bb5a4f7f40e9"
APP_USER_MANAGER="64fac46b-8b44-41ad-93ee-7da03927576c"
APP_TENANT_ADMIN="709e6eed-0711-4875-9c44-2d3518c47063"
SP_RESOURCE_APPID="00000003-0000-0ff1-ce00-000000000000" # Office 365 SharePoint Online
SP_FULLCONTROL_ROLE="Sites.FullControl.All"
EXCHANGE_ADMIN_ROLE_TEMPLATE="29232cdf-9323-42fd-ade2-1d097af3e4de" # Exchange Administrator
# must-have Graph roles per tier (curated: absence causes real task failures)
BASE_investigator="Directory.Read.All User.Read.All Sites.Read.All Mail.Read AuditLog.Read.All"
BASE_exchange_op_graph="Mail.ReadWrite MailboxSettings.ReadWrite"
BASE_user_manager="User.ReadWrite.All Group.ReadWrite.All Directory.ReadWrite.All"
BASE_tenant_admin="Application.ReadWrite.All AppRoleAssignment.ReadWrite.All Directory.ReadWrite.All RoleManagement.ReadWrite.Directory Sites.FullControl.All Sites.ReadWrite.All"
VERBOSE=0; MATRIX=0; ALL=0; TARGET=""
for a in "$@"; do
case "$a" in
--all) ALL=1 ;;
--verbose|-v) VERBOSE=1 ;;
--matrix) MATRIX=1 ;;
-h|--help) sed -n '2,22p' "$0"; exit 3 ;;
*) TARGET="$a" ;;
esac
done
jwt_roles() { # $1: token -> space-separated roles ("" if not a JWT)
python - "${1:-}" <<'PY'
import sys,base64,json
t=(sys.argv[1] if len(sys.argv)>1 else "").strip()
if t.count('.')!=2 or not t.startswith('ey'):
print(""); sys.exit(0)
try:
p=t.split('.')[1]; p+='='*(-len(p)%4)
print(" ".join(json.loads(base64.urlsafe_b64decode(p)).get("roles",[])))
except Exception:
print("")
PY
}
missing_of() { # $1=have (space list) $2=want (space list) -> missing items
local have=" $1 " out=""
for w in $2; do case "$have" in *" $w "*) ;; *) out="$out $w";; esac; done
echo "${out# }"
}
# --- resolve tenant guid + primary domain ----------------------------------------------
resolve_tenant() { # $1 = domain or guid ; sets TID + DOMAIN + SPPREFIX
local in="$1"
if [[ "$in" =~ ^[0-9a-fA-F-]{36}$ ]]; then TID="$in"; else
TID="$(curl -s "https://login.microsoftonline.com/$in/v2.0/.well-known/openid-configuration" | jq -r '.token_endpoint//""' | cut -d/ -f4)"
fi
[ -z "${TID:-}" ] && { echo "[ERROR] could not resolve tenant '$in'"; return 2; }
# onmicrosoft prefix for SharePoint host, via tenant-admin /domains
local ta; ta="$($GT "$TID" tenant-admin 2>/dev/null | tail -1)"
if [ -n "$ta" ] && printf '%s' "$ta" | grep -q '^ey'; then
local init; init="$(curl -s -G "$GRAPH/domains" --data-urlencode "\$select=id,isInitial" -H "Authorization: Bearer $ta" | jq -r '.value[]?|select(.isInitial==true)|.id')"
DOMAIN="$(curl -s -G "$GRAPH/domains" --data-urlencode "\$select=id,isDefault" -H "Authorization: Bearer $ta" | jq -r '.value[]?|select(.isDefault==true)|.id')"
SPPREFIX="${init%%.onmicrosoft.com}"
[ -z "$DOMAIN" ] && DOMAIN="$init"
else DOMAIN="$in"; SPPREFIX=""; fi
return 0
}
# --- one capability check --------------------------------------------------------------
# emits a status line; appends fixes to $FIXES; bumps $WORST (0/1/2)
check_graph_tier() { # $1=label $2=tier $3=appid $4=want-roles
local label="$1" tier="$2" appid="$3" want="$4"
rm -f "/tmp/remediation-tool/$TID/$tier.jwt" 2>/dev/null
local tok; tok="$($GT "$TID" "$tier" 2>/dev/null | tail -1)"
local roles; roles="$(jwt_roles "$tok")"
if [ -z "$roles" ] && ! printf '%s' "$tok" | grep -q '^ey'; then
printf " [RED] %-16s app NOT consented (token mint failed)\n" "$label"
FIXES="$FIXES\n # $label: consent the app\n https://login.microsoftonline.com/$TID/adminconsent?client_id=$appid"
WORST=2; return
fi
local miss; miss="$(missing_of "$roles" "$want")"
if [ -z "$miss" ]; then
printf " [OK] %-16s all baseline roles present\n" "$label"
[ "$VERBOSE" = 1 ] && echo " have: $roles"
else
printf " [AMBER] %-16s PARTIAL — missing: %s\n" "$label" "$miss"
FIXES="$FIXES\n # $label: re-consent grants the full manifest\n https://login.microsoftonline.com/$TID/adminconsent?client_id=$appid"
[ "$WORST" -lt 1 ] && WORST=1
fi
}
check_sharepoint() {
local host="${SPPREFIX:+$SPPREFIX-admin.sharepoint.com}"
rm -f "/tmp/remediation-tool/$TID/sharepoint-admin.jwt" 2>/dev/null
local tok; tok="$(SP_RESOURCE_ENV="$host" $GT "$TID" sharepoint-admin 2>/dev/null | tail -1)"
local roles; roles="$(jwt_roles "$tok")"
case " $roles " in
*" $SP_FULLCONTROL_ROLE "*)
printf " [OK] %-16s Sites.FullControl.All present (cert)\n" "SharePoint" ;;
*)
printf " [AMBER] %-16s missing SharePoint app-only Sites.FullControl.All\n" "SharePoint"
FIXES="$FIXES\n # SharePoint: adminconsent often does NOT attach this — grant the app-only role directly (Method B, app-suite.md):\n # TA=\$($GT $TID tenant-admin|tail -1); recip=Tenant-Admin SP id; res=SharePoint SP id; role=Sites.FullControl.All id\n # POST /servicePrincipals/{recip}/appRoleAssignments {principalId,resourceId,appRoleId}\n https://login.microsoftonline.com/$TID/adminconsent?client_id=$APP_TENANT_ADMIN # try re-consent first"
[ "$WORST" -lt 1 ] && WORST=1 ;;
esac
}
check_exo_role() { # Exchange Admin directory role on the Exchange Operator SP
local ta; ta="$($GT "$TID" tenant-admin 2>/dev/null | tail -1)"
printf '%s' "$ta" | grep -q '^ey' || { printf " [SKIP] %-16s (need tenant-admin to check)\n" "EXO role"; return; }
local spid; spid="$(curl -s -G "$GRAPH/servicePrincipals" --data-urlencode "\$filter=appId eq '$APP_EXCHANGE_OP'" --data-urlencode "\$select=id" -H "Authorization: Bearer $ta" | jq -r '.value[0].id//""')"
[ -z "$spid" ] && { printf " [RED] %-16s Exchange Operator app NOT consented\n" "EXO role"
FIXES="$FIXES\n # Exchange Operator: consent the app\n https://login.microsoftonline.com/$TID/adminconsent?client_id=$APP_EXCHANGE_OP"; WORST=2; return; }
local has; has="$(curl -s -G "$GRAPH/roleManagement/directory/roleAssignments" --data-urlencode "\$filter=principalId eq '$spid'" -H "Authorization: Bearer $ta" | jq -r --arg r "$EXCHANGE_ADMIN_ROLE_TEMPLATE" '[.value[]?|select(.roleDefinitionId==$r)]|length')"
if [ "${has:-0}" -ge 1 ] 2>/dev/null; then
printf " [OK] %-16s Exchange Administrator role assigned\n" "EXO role"
else
printf " [AMBER] %-16s Exchange Operator SP missing Exchange Administrator role\n" "EXO role"
FIXES="$FIXES\n # EXO: assign the Exchange Administrator role\n bash $SCRIPT_DIR/assign-exchange-role.sh $DOMAIN"
[ "$WORST" -lt 1 ] && WORST=1
fi
}
audit_one() { # $1 = domain|guid ; returns worst grade (0/1/2)
TID=""; DOMAIN=""; SPPREFIX=""; FIXES=""; WORST=0
resolve_tenant "$1" || return 2
if [ "$MATRIX" = 1 ]; then
# compact single-line probe for matrix use
:
fi
echo "============================================================"
echo " Consent audit — ${DOMAIN:-$1} ($TID)"
echo "============================================================"
check_graph_tier "investigator" investigator "$APP_INVESTIGATOR" "$BASE_investigator"
check_graph_tier "exchange-op" exchange-op-graph "$APP_EXCHANGE_OP" "$BASE_exchange_op_graph"
check_graph_tier "user-manager" user-manager "$APP_USER_MANAGER" "$BASE_user_manager"
check_graph_tier "tenant-admin" tenant-admin "$APP_TENANT_ADMIN" "$BASE_tenant_admin"
check_sharepoint
check_exo_role
local grade="GREEN"; [ "$WORST" = 1 ] && grade="AMBER"; [ "$WORST" = 2 ] && grade="RED"
echo "------------------------------------------------------------"
echo " GRADE: $grade"
if [ "$WORST" -gt 0 ]; then echo " FIXES:"; printf '%b\n' "$FIXES"; fi
echo ""
return "$WORST"
}
# --- fleet audit -----------------------------------------------------------------------
# --all iterates every tenant in references/tenants.md, printing a per-tenant block +
# fixes, and exits with the WORST grade seen. A compact matrix mode is a follow-up.
if [ "$ALL" = 1 ]; then
tfile="$SKILL_DIR/references/tenants.md"
[ -f "$tfile" ] || { echo "[ERROR] tenants.md not found"; exit 3; }
worstall=0
while IFS='|' read -r _ disp dom guid rest; do
guid="$(echo "${guid:-}" | tr -d ' ')"
[[ "$guid" =~ ^[0-9a-fA-F-]{36}$ ]] || continue
audit_one "$guid"; rc=$?
[ "$rc" -gt "$worstall" ] && worstall=$rc
done < <(grep -E '^\|' "$tfile" | grep -vE 'Display Name|^\|\s*-')
exit "$worstall"
fi
[ -z "$TARGET" ] && { echo "[ERROR] give a tenant (domain|guid) or --all"; sed -n '9,13p' "$0"; exit 3; }
audit_one "$TARGET"; exit $?

View File

@@ -25,6 +25,12 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure ·
2026-07-02 | Howard-Home | unifi/site-manager-api | [friction] vault infrastructure/unifi-site-manager-api key returns 401 (stale/rotated); the WORKING cloud key is services/unifi-site-manager (X-API-KEY vs api.ui.com) [ctx: ref=uos-server wiki; use services/unifi-site-manager] 2026-07-02 | Howard-Home | unifi/site-manager-api | [friction] vault infrastructure/unifi-site-manager-api key returns 401 (stale/rotated); the WORKING cloud key is services/unifi-site-manager (X-API-KEY vs api.ui.com) [ctx: ref=uos-server wiki; use services/unifi-site-manager]
2026-07-02 | GURU-5070 | remediation-tool/consent-drift | [correction] assumed VWP had SharePoint access because the suite 'has' it; VWP had the Tenant Admin app but only PARTIAL consent (Graph Sites but not the SharePoint app-only role) -> SP calls 401 with empty roles. Fix: audit per-tenant token roles, grant missing app role via appRoleAssignment (Method B). Per-tenant consent is NOT uniform. [ctx: ref=app-suite.md tenant=VWP]
2026-07-02 | GURU-5070 | rmm/long-install-reaper | [friction] long download+install (131MB Falcon) exceeds the RMM command timeout on a slow-egress box -> command shows failed/'Command timeout'/'Access is denied' but the install COMPLETES in background (service came up Running). Verify service state after, don't trust the failed status for fire-and-forget installs. [ctx: ref=reference_gururmm_command_timeout_seconds host=ACG-DC16]
2026-07-02 | GURU-5070 | ps-encoded/server2016 | [friction] ps-encoded.sh rmm (shell->cmd.exe->powershell -EncodedCommand) returns 'Access is denied' with no stdout on Windows Server 2016 (DC16); plain command_type=powershell works. Fall back to direct powershell dispatch on Server 2016. [ctx: ref=ps-encoded.sh host=ACG-DC16 os=server2016]
2026-07-02 | GURU-BEAST-ROG | self-check/registry-trim | [friction] trimmed skill registry locally while GURU-5070 shipped the same trim upstream; auto-sync merge raced my uncommitted edits (transient UU state, stale 15777 reading mid-merge); fix: check coord / claim a lock before fleet-wide harness edits [ctx: ref=coord-locks] 2026-07-02 | GURU-BEAST-ROG | self-check/registry-trim | [friction] trimmed skill registry locally while GURU-5070 shipped the same trim upstream; auto-sync merge raced my uncommitted edits (transient UU state, stale 15777 reading mid-merge); fix: check coord / claim a lock before fleet-wide harness edits [ctx: ref=coord-locks]
2026-07-02 | Howard-Home | rmm/user-manager | [correction] reset Shelby.Trozzi domain password with raw Set-ADAccountPassword via /rmm; memory reference_gururmm_user_manager says use the built-in GuruRMM User Manager (reset_password action, is_dc) instead. [ctx: ref=reference_gururmm_user_manager] 2026-07-02 | Howard-Home | rmm/user-manager | [correction] reset Shelby.Trozzi domain password with raw Set-ADAccountPassword via /rmm; memory reference_gururmm_user_manager says use the built-in GuruRMM User Manager (reset_password action, is_dc) instead. [ctx: ref=reference_gururmm_user_manager]