sync: auto-sync from GURU-5070 at 2026-06-10 16:02:59

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-10 16:02:59
This commit is contained in:
2026-06-10 16:03:11 -07:00
parent d573842ba2
commit 63f427a95f
7 changed files with 427 additions and 0 deletions

View File

@@ -0,0 +1,75 @@
# Onboard365 Skill — Single-Consent M365 Tenant Onboarding
## User
- **User:** Mike Swanson (mike)
- **Machine:** GURU-5070
- **Role:** admin
## Session Summary
Built a new `onboard365` skill (+ `/onboard365` command) that drives single-consent onboarding of a customer M365 tenant to the ComputerGuru remediation app suite. The motivating insight, discovered while reading the existing `remediation-tool` scripts, is that the single-consent model is already implemented inside `onboard-tenant.sh`: the customer Global Admin consents once to the Tenant Admin app (which holds Application.ReadWrite.All + AppRoleAssignment.ReadWrite.All + RoleManagement.ReadWrite.Directory), and the script then programmatically creates the other app service principals, grants all their Graph/EXO/Defender app-role assignments, and assigns the required Entra directory roles — no further customer clicks. Onboard365 is a clean, first-class front-end for that flow.
The skill was deliberately built as a thin orchestrator that reuses the existing remediation-tool scripts (resolve-tenant.sh, get-token.sh, onboard-tenant.sh) at their stable applied path ($HOME/.claude/skills/remediation-tool/scripts), with a repo fallback resolved via identity.json.claudetools_root. This avoids duplicating the role GUIDs, app IDs, and consent logic, which evolve over time and would otherwise drift between two copies.
Created four files in the repo (the synced source of truth that propagates to the fleet via sync.sh Phase 5c): the SKILL.md playbook, scripts/onboard365.sh (link/status/provision/smart-auto modes), references/customer-consent-instructions.md (a ready-to-send customer email template), and commands/onboard365.md (thin /onboard365 entry point). The script was made executable.
Validated end-to-end on this machine: `link` mode resolved grabblaw.com to its tenant GUID and produced the correct Tenant Admin consent URL; `status` (dry-run) ran the full reuse chain (resolve-tenant -> get-token tenant-admin -> onboard-tenant.sh --dry-run) against the already-onboarded grabblaw.com tenant and reported every role PRESENT/[OK] with no changes made, and correctly auto-skipped Defender (no MDE license). Both the skill and command registered (visible in the skill/command lists).
## Key Decisions
- Reuse, not duplicate: onboard365.sh calls the existing remediation-tool scripts rather than copying onboard-tenant.sh, so role GUIDs/app IDs/consent logic live in exactly one maintained place.
- Built in the REPO `.claude/skills/` (not the applied global `~/.claude/skills/`), because sync.sh Phase 5c copies repo skills one-way to global on every machine. The repo is the source of truth.
- Cross-skill script reference uses the stable applied path `$HOME/.claude/skills/remediation-tool/scripts` (present on every fleet machine after sync), with a repo fallback via identity.json.claudetools_root.
- Recording (tenant registry update) is a documented Claude step in SKILL.md rather than script logic — and it points at the REPO copy of remediation-tool/references/tenants.md (resolved via claudetools_root) so the Onboarded=YES update persists and syncs, instead of editing the ephemeral applied copy.
- Kept the skill scoped to provisioning the app suite only; user/CA/break-glass actions remain remediation-tool territory.
## Problems Encountered
- The remediation-tool scripts resolve their identity.json relative to the skill install dir, so on GURU-5070 they read the HOME identity (no vault_path) and fail token acquisition. Same known quirk as earlier this session; worked around with `export VAULT_ROOT_ENV=D:/vault` for the integration test. Documented the workaround in SKILL.md (Vault section).
## Configuration Changes
- Created `.claude/skills/onboard365/SKILL.md`
- Created `.claude/skills/onboard365/scripts/onboard365.sh` (chmod +x)
- Created `.claude/skills/onboard365/references/customer-consent-instructions.md`
- Created `.claude/commands/onboard365.md`
## Credentials & Secrets
- None created. The skill uses existing app credentials via the SOPS vault (msp-tools/computerguru-*.sops.yaml) through the reused get-token.sh.
- Tenant Admin app id referenced by the consent URL: `709e6eed-0711-4875-9c44-2d3518c47063` (public app id, not a secret).
## Infrastructure & Servers
- Consent URL pattern: `https://login.microsoftonline.com/<tenant>/adminconsent?client_id=709e6eed-0711-4875-9c44-2d3518c47063&redirect_uri=https://azcomputerguru.com&prompt=consent`
- App suite provisioned by onboard-tenant.sh: Security Investigator (bfbc12a4-…), Exchange Operator (b43e7342-…), User Manager (64fac46b-…), Tenant Admin (709e6eed-…), Defender Add-on (dbf8ad1a-…).
- Directory roles assigned: Sec Inv + Exch Op -> Exchange Administrator; User Manager -> User Administrator + Authentication Administrator; Tenant Admin -> Conditional Access Administrator. Defender = ATP API only (no directory role; MDE-licensed tenants only).
- Test tenant: grabblaw.com = `032b383e-96e4-491b-880d-3fd3295672c3` (already onboarded; used for read-only dry-run validation).
## Commands & Outputs
```bash
# usage
onboard365.sh <domain> # smart: link if not consented, else provision
onboard365.sh link <domain> # single Tenant Admin consent URL + customer steps
onboard365.sh status <domain> # dry-run consent/role state
onboard365.sh provision <domain> # provision all apps + roles after consent
# validation (this session)
bash -n onboard365.sh # syntax OK
onboard365.sh link grabblaw.com # resolved tenant + built URL
VAULT_ROOT_ENV=D:/vault onboard365.sh status grabblaw.com # all roles PRESENT/[OK], no changes
```
## Pending / Incomplete Tasks
- The four new files are committed via this save's sync (see Post-commit). They propagate to other fleet machines' ~/.claude/skills on their next sync (Phase 5c).
- Not yet exercised against a brand-new (un-consented) tenant end-to-end — provision path is the same onboard-tenant.sh already in production use, but a real first-time onboard would be the final confidence check.
- Fleet quirk still open: remediation-tool scripts need VAULT_ROOT_ENV or a vault_path fix in the home identity.json on GURU-5070.
## Reference Information
- Skill dir: `.claude/skills/onboard365/`
- Command: `.claude/commands/onboard365.md` (`/onboard365`)
- Reused scripts: `.claude/skills/remediation-tool/scripts/{resolve-tenant,get-token,onboard-tenant,assign-exchange-role}.sh`
- Tenant registry (update Onboarded=YES here, repo copy): `.claude/skills/remediation-tool/references/tenants.md`