sync: auto-sync from GURU-5070 at 2026-06-10 16:02:59
Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-06-10 16:02:59
This commit is contained in:
36
.claude/commands/onboard365.md
Normal file
36
.claude/commands/onboard365.md
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
# /onboard365 — Single-consent M365 tenant onboarding
|
||||||
|
|
||||||
|
Onboard a customer Microsoft 365 tenant to the ComputerGuru remediation app suite with **one**
|
||||||
|
customer admin-consent click. Thin entry point to the `onboard365` skill.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
```
|
||||||
|
/onboard365 <domain|tenant-id> Smart: print the consent link if not yet consented,
|
||||||
|
or provision the whole suite if it is.
|
||||||
|
/onboard365 link <domain> Just generate the single Tenant Admin consent URL.
|
||||||
|
/onboard365 status <domain> Dry-run: show current consent / role state.
|
||||||
|
/onboard365 provision <domain> After the customer consents: provision all apps + roles.
|
||||||
|
```
|
||||||
|
|
||||||
|
## What it does
|
||||||
|
|
||||||
|
The customer Global Admin consents once to **ComputerGuru Tenant Admin**. Using that grant,
|
||||||
|
`onboard-tenant.sh` (reused from the `remediation-tool` skill) then creates the service
|
||||||
|
principals for Security Investigator, Exchange Operator, User Manager, and (if MDE-licensed)
|
||||||
|
Defender Add-on, grants all their Graph/EXO/Defender permissions, and assigns the required
|
||||||
|
Entra directory roles — no further customer clicks.
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
1. Read the full playbook in `.claude/skills/onboard365/SKILL.md`.
|
||||||
|
2. Run `bash .claude/skills/onboard365/scripts/onboard365.sh <subcommand> <domain>`
|
||||||
|
(the script auto-locates the reused remediation-tool scripts and the vault).
|
||||||
|
3. Confirm the target tenant with the user before generating a link, and again before
|
||||||
|
`provision` (high-privilege, customer-facing).
|
||||||
|
4. After a clean provision, **record it**: set the tenant's `Onboarded` column to `YES` in the
|
||||||
|
REPO copy of `remediation-tool/references/tenants.md` and note the onboarding in the client
|
||||||
|
wiki. (See SKILL.md → Recording.)
|
||||||
|
|
||||||
|
This is the front door; once a tenant is onboarded, breach checks and remediation are the
|
||||||
|
`remediation-tool` skill.
|
||||||
@@ -22,6 +22,7 @@
|
|||||||
- [Gitea Internal API Access](reference_gitea_internal.md) — git.azcomputerguru.com is NOT behind Cloudflare — it's the office Cox IP NAT'd to NPM (openresty) on Jupiter. Prefer internal 172.16.3.20:3000 for reliability (bypasses NPM SSL-renewal reload blips).
|
- [Gitea Internal API Access](reference_gitea_internal.md) — git.azcomputerguru.com is NOT behind Cloudflare — it's the office Cox IP NAT'd to NPM (openresty) on Jupiter. Prefer internal 172.16.3.20:3000 for reliability (bypasses NPM SSL-renewal reload blips).
|
||||||
- [Gitea git-op latency](reference_gitea_git_op_latency.md) — SSH (.20:2222) is SLOWEST (~1.5s); internal HTTP+token ~0.55s; SOPS lookup only ~0.33s. Don't switch to SSH for speed. Gitea SSH is .20:2222 (API ssh_url .21 is wrong).
|
- [Gitea git-op latency](reference_gitea_git_op_latency.md) — SSH (.20:2222) is SLOWEST (~1.5s); internal HTTP+token ~0.55s; SOPS lookup only ~0.33s. Don't switch to SSH for speed. Gitea SSH is .20:2222 (API ssh_url .21 is wrong).
|
||||||
- [GuruRMM technical reference](reference_gururmm.md) — Server (172.16.3.30) layout + downloads dir `/var/www/gururmm/downloads` + `.channel` sidecar rollout control (stable/beta) + privileged server access via the server's OWN root RMM agent (hostname `gururmm`, no SSH needed; plink fallback) + API + `context=user_session` (WTS impersonation) + build-pipeline vendoring at `deploy/build-pipeline/` + Linux agent systemd sandbox trap.
|
- [GuruRMM technical reference](reference_gururmm.md) — Server (172.16.3.30) layout + downloads dir `/var/www/gururmm/downloads` + `.channel` sidecar rollout control (stable/beta) + privileged server access via the server's OWN root RMM agent (hostname `gururmm`, no SSH needed; plink fallback) + API + `context=user_session` (WTS impersonation) + build-pipeline vendoring at `deploy/build-pipeline/` + Linux agent systemd sandbox trap.
|
||||||
|
- [RMM agent update model](rmm-agent-update-model.md) — Agent updates are server-PUSH on heartbeat (no self-poll); available versions = filesystem scan needing a `.sha256`; promote flips `.channel` sidecars beta→stable globally. Two stranders: beta-first freezes stable until an explicit promote; agents older than ~0.6.50 re-enroll with a NEW device_id/agent row when updated.
|
||||||
- [Trebesch DESKTOP-QNP3ON5 shell replacement](reference_trebesch_qnp3on5.md) — AT Trebesch box runs an Explorer shell replacement; explorer.exe owner check returns blank — use Win32_ComputerSystem.UserName. GuruRMM SWIFT-LION-2892.
|
- [Trebesch DESKTOP-QNP3ON5 shell replacement](reference_trebesch_qnp3on5.md) — AT Trebesch box runs an Explorer shell replacement; explorer.exe owner check returns blank — use Win32_ComputerSystem.UserName. GuruRMM SWIFT-LION-2892.
|
||||||
|
|
||||||
## Users
|
## Users
|
||||||
|
|||||||
42
.claude/memory/rmm-agent-update-model.md
Normal file
42
.claude/memory/rmm-agent-update-model.md
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
---
|
||||||
|
name: rmm-agent-update-model
|
||||||
|
description: How GuruRMM agents actually update (server-push on heartbeat, channel-gated, beta-first) and two gotchas that strand agents
|
||||||
|
metadata:
|
||||||
|
type: project
|
||||||
|
---
|
||||||
|
|
||||||
|
GuruRMM agent updates are **100% server-push** — the agent never self-polls. On every
|
||||||
|
heartbeat the server (`server/src/ws/mod.rs` ~line 1124) resolves the agent's channel,
|
||||||
|
calls `UpdateManager::needs_update`, and pushes `ServerMessage::Update` if a newer build
|
||||||
|
exists. A pending update is re-dispatched on the next heartbeat (the `[RE-DISPATCH]` path).
|
||||||
|
The only other Update senders are the manual `POST /api/agents/:id/update` and rollback.
|
||||||
|
|
||||||
|
**Available versions = a filesystem scan**, not a DB table. `updates/scanner.rs` scans
|
||||||
|
`/var/www/gururmm/downloads/` for `gururmm-agent-{os}-{arch}-{ver}.exe` (per-site
|
||||||
|
`...-site-<uuid>-...` names deliberately fail to parse), requires a `.sha256` companion
|
||||||
|
(no checksum → silently skipped), and reads channel from a `<binary>.channel` sidecar
|
||||||
|
(absent or non-"beta" ⇒ **stable**). `get_latest_version` for a stable agent returns the
|
||||||
|
newest binary whose sidecar isn't "beta". Channel resolves agent→site→client→"stable".
|
||||||
|
|
||||||
|
**Promotion** (`POST /api/updates/rollouts/:ver/promote`) just flips every matching
|
||||||
|
`.channel` sidecar beta→stable (globally — os/arch only scopes the health-gate + rollout
|
||||||
|
DB row) and rescans. The fleet then pulls it on the next heartbeat. Rollback removes the
|
||||||
|
sidecars + blocks the version + downgrades. Dashboard admin login: vault
|
||||||
|
`projects/gururmm/dashboard`. DB: `psql "$DATABASE_URL"` after `source ~/.cargo/env` on
|
||||||
|
guru@172.16.3.30.
|
||||||
|
|
||||||
|
Two gotchas that strand agents (both hit 2026-06-10):
|
||||||
|
1. **Beta-first freezes stable.** New builds are tagged beta; stable only advances on an
|
||||||
|
explicit promote. Stable had been frozen at 0.6.47 (since 2026-05-28) while builds ran
|
||||||
|
to 0.6.58 beta — so every stable agent silently stopped updating. Promoting 0.6.58
|
||||||
|
rolled ~200 agents in minutes.
|
||||||
|
2. **Old agents re-enroll with a NEW identity.** The device_id format changed (`win-<uuid>`
|
||||||
|
→ bare `<uuid>`) somewhere between 0.6.27 and ~0.6.50. An agent old enough to cross that
|
||||||
|
boundary (e.g. megan, 0.6.27→0.6.58) re-registers as a **new agent row** instead of
|
||||||
|
updating in place, orphaning its old row (clean up the stale duplicate). Agents already
|
||||||
|
past the boundary update in place.
|
||||||
|
|
||||||
|
Related: [[reference_gururmm]] (downloads dir + sidecar detail + privileged server access).
|
||||||
|
Audit/log-feedback work: build/version correlation lives in `log_signatures` +
|
||||||
|
`log_signature_versions`; server self-errors are captured via `self_log.rs` into the
|
||||||
|
"GuruRMM Server" pseudo-agent.
|
||||||
120
.claude/skills/onboard365/SKILL.md
Normal file
120
.claude/skills/onboard365/SKILL.md
Normal file
@@ -0,0 +1,120 @@
|
|||||||
|
---
|
||||||
|
name: onboard365
|
||||||
|
description: "Single-consent onboarding of a customer Microsoft 365 tenant to the ComputerGuru remediation app suite (Security Investigator / Exchange Operator / User Manager / Tenant Admin / Defender). The customer Global Admin clicks ONE admin-consent link (Tenant Admin); everything else — service principals, Graph/EXO/Defender permissions, and Entra directory roles — is provisioned automatically, no further clicks. Triggers: onboard 365, onboard a tenant, add tenant to remediation tools, single consent, consent link for new client, provision tenant apps, new M365 client onboarding, get a tenant ready for breach checks."
|
||||||
|
---
|
||||||
|
|
||||||
|
# Onboard365 — Single-Consent M365 Tenant Onboarding
|
||||||
|
|
||||||
|
Gets a customer M365 tenant ready for the `remediation-tool` suite with **one** customer
|
||||||
|
action: a single admin-consent click on the **ComputerGuru Tenant Admin** app. After that,
|
||||||
|
ACG provisions every other app and role programmatically using the Tenant Admin token — the
|
||||||
|
customer never sees five separate consent prompts.
|
||||||
|
|
||||||
|
This skill is a thin orchestrator. The provisioning logic, role GUIDs, and app IDs live in the
|
||||||
|
`remediation-tool` skill (`onboard-tenant.sh`) and are reused here so they never drift. Do NOT
|
||||||
|
duplicate that script — call it.
|
||||||
|
|
||||||
|
## Why "single consent"
|
||||||
|
|
||||||
|
Microsoft admin consent is per-application, so naively onboarding the 5-app suite would mean
|
||||||
|
5 customer clicks. We avoid that: the **Tenant Admin** app holds `Application.ReadWrite.All` +
|
||||||
|
`AppRoleAssignment.ReadWrite.All` + `RoleManagement.ReadWrite.Directory`. Once the customer
|
||||||
|
consents to Tenant Admin, our automation can, on their behalf:
|
||||||
|
|
||||||
|
1. Create the service principal for each other app (this IS admin consent for that app).
|
||||||
|
2. Grant every required Graph / Exchange Online / Defender app-role assignment.
|
||||||
|
3. Assign the required Entra directory roles to each SP.
|
||||||
|
|
||||||
|
Net: **one** customer click; the rest is `onboard-tenant.sh`.
|
||||||
|
|
||||||
|
## The flow
|
||||||
|
|
||||||
|
```
|
||||||
|
onboard365.sh <domain> # smart: prints the consent link if not yet consented,
|
||||||
|
# or provisions the whole suite if it is
|
||||||
|
onboard365.sh link <domain> # just print the single consent URL + customer instructions
|
||||||
|
onboard365.sh status <domain> # dry-run: show current consent/role state, change nothing
|
||||||
|
onboard365.sh provision <domain> # after the customer consents: provision all apps + roles
|
||||||
|
```
|
||||||
|
|
||||||
|
Script lives at `scripts/onboard365.sh` in this skill. It auto-locates the `remediation-tool`
|
||||||
|
scripts at `$HOME/.claude/skills/remediation-tool/scripts` (repo fallback via
|
||||||
|
`identity.json.claudetools_root`).
|
||||||
|
|
||||||
|
### Step-by-step (what to actually do)
|
||||||
|
|
||||||
|
1. **Identify the tenant.** Accept a domain (e.g. `acme.com`), an `.onmicrosoft.com`, or a
|
||||||
|
tenant GUID. Run `onboard365.sh link <domain>` to resolve it and produce the consent URL.
|
||||||
|
2. **Send the single link to the customer's Global Admin.** Use the template at
|
||||||
|
`references/customer-consent-instructions.md`. They sign in and click **Accept**. That is
|
||||||
|
the only thing they do. The app shown will be **"ComputerGuru Tenant Admin"**.
|
||||||
|
3. **Provision.** Once they confirm they accepted, run `onboard365.sh provision <domain>`
|
||||||
|
(or just `onboard365.sh <domain>` — it detects consent and proceeds). This runs
|
||||||
|
`onboard-tenant.sh`, which creates the other SPs, grants all permissions, and assigns the
|
||||||
|
directory roles. Watch the final status table — every row should be `OK` / `ASSIGNED`.
|
||||||
|
4. **Verify.** Re-run `onboard365.sh status <domain>` (dry-run). All roles should read
|
||||||
|
`PRESENT`. Optionally confirm an Exchange path with
|
||||||
|
`remediation-tool/scripts/assign-exchange-role.sh <domain> --verify`.
|
||||||
|
5. **Record it** (see Recording below).
|
||||||
|
|
||||||
|
## What gets provisioned (handled by onboard-tenant.sh — do not re-implement)
|
||||||
|
|
||||||
|
| App | Graph/EXO/Defender perms | Directory role assigned |
|
||||||
|
|---|---|---|
|
||||||
|
| Tenant Admin (consented by customer) | high-privilege Graph (incl. `Policy.Read.All` backfill) | Conditional Access Administrator |
|
||||||
|
| Security Investigator | Graph read + EXO read | Exchange Administrator |
|
||||||
|
| Exchange Operator | Graph + EXO write | Exchange Administrator |
|
||||||
|
| User Manager | Graph user/group/auth write | User Administrator + Authentication Administrator |
|
||||||
|
| Defender Add-on | Graph + Defender ATP | (Defender API; no directory role) — **MDE-licensed tenants only** |
|
||||||
|
|
||||||
|
The script auto-detects MDE licensing: if the Defender ATP resource SP isn't present, it
|
||||||
|
skips Defender cleanly (not an error).
|
||||||
|
|
||||||
|
## Recording (durable — do this after a successful provision)
|
||||||
|
|
||||||
|
The script provisions but does NOT write your records. After a clean run:
|
||||||
|
|
||||||
|
1. **Tenant registry — edit the REPO copy** (so it persists + syncs to the fleet), not the
|
||||||
|
applied global copy. Path: `$CLAUDETOOLS_ROOT/.claude/skills/remediation-tool/references/tenants.md`
|
||||||
|
(resolve `$CLAUDETOOLS_ROOT` from `identity.json`). Set the tenant's **Onboarded** column to
|
||||||
|
`YES` and add a dated Notes line listing what was consented + roles assigned. If the tenant
|
||||||
|
isn't in the table yet, add a row (Display Name | Domain | Tenant ID | Onboarded | Notes).
|
||||||
|
2. **Client wiki / CONTEXT.** If `wiki/clients/<slug>.md` exists, note tenant onboarding under
|
||||||
|
Cloud/M365 (tenant ID, "remediation suite onboarded YYYY-MM-DD, all apps + roles").
|
||||||
|
3. Use UTC dates.
|
||||||
|
|
||||||
|
## Conventions & guardrails
|
||||||
|
|
||||||
|
- **Outward-facing action.** Sending a consent link to a customer and provisioning apps in
|
||||||
|
their tenant is customer-facing. Confirm the target tenant with the user before generating
|
||||||
|
the link, and again before running `provision` (the Tenant Admin grant is high-privilege).
|
||||||
|
- **One link only.** Do not send the customer the per-app consent URLs unless `provision`
|
||||||
|
reports that a specific app failed programmatic consent — then `onboard-tenant.sh` prints the
|
||||||
|
fallback per-app URLs. The whole point is a single click.
|
||||||
|
- **Idempotent.** Re-running `provision` on an already-onboarded tenant is safe — every grant
|
||||||
|
and role assignment checks-before-creating and treats "already exists" as success.
|
||||||
|
- **Vault.** Token acquisition uses the SOPS vault via `identity.json.vault_path`. On a machine
|
||||||
|
where the skill resolves the wrong identity.json (no `vault_path`), export
|
||||||
|
`VAULT_ROOT_ENV=<vault path>` before running (known remediation-tool quirk).
|
||||||
|
- **Break-glass / least privilege.** This skill only provisions the standing app suite. It does
|
||||||
|
NOT touch customer user accounts, CA policies, or break-glass accounts — that's
|
||||||
|
`remediation-tool` territory.
|
||||||
|
|
||||||
|
## Common results / troubleshooting
|
||||||
|
|
||||||
|
- `[WARNING] Tenant Admin app not yet consented` (exit 2): the customer hasn't accepted yet, or
|
||||||
|
accepted as a non-Global-Admin. Re-send the link; confirm they're a Global Administrator.
|
||||||
|
- `AADSTS7000229`: the SP isn't in the tenant — same as not-consented; resend the link.
|
||||||
|
- A role row shows `ERROR`: usually transient Graph replication. Re-run `provision` once; if it
|
||||||
|
persists, the customer may need to re-accept the Tenant Admin consent (the script prints the
|
||||||
|
re-consent URL).
|
||||||
|
- `vault_path not set`: export `VAULT_ROOT_ENV` (see Vault above).
|
||||||
|
- Exchange tasks 403 later despite onboarding: run
|
||||||
|
`remediation-tool/scripts/assign-exchange-role.sh <domain>` — the Exchange Administrator role
|
||||||
|
on the Exchange Operator SP is the recurring gap; onboarding assigns it, but verify.
|
||||||
|
|
||||||
|
## Relationship to remediation-tool
|
||||||
|
|
||||||
|
Onboard365 = the front door (get a tenant consented + provisioned). `remediation-tool` = the
|
||||||
|
work (breach checks, sweeps, mailbox/user/CA remediation) once the tenant is onboarded. After a
|
||||||
|
successful onboard, a breach check is `remediation-tool` territory.
|
||||||
@@ -0,0 +1,40 @@
|
|||||||
|
# Customer Consent Instructions (template)
|
||||||
|
|
||||||
|
Use this when handing the single consent link to a customer's Global Administrator.
|
||||||
|
Fill in `{{CUSTOMER}}`, `{{CONSENT_URL}}` (from `onboard365.sh link <domain>`), and your name.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Subject:** Action needed: one-click approval to connect Arizona Computer Guru to your Microsoft 365
|
||||||
|
|
||||||
|
Hi {{ADMIN_NAME}},
|
||||||
|
|
||||||
|
To manage and protect your Microsoft 365 environment, we need a one-time approval from a
|
||||||
|
Microsoft 365 **Global Administrator** at {{CUSTOMER}}. This is a single click — you won't need
|
||||||
|
to approve anything else after this.
|
||||||
|
|
||||||
|
1. Open this link while signed in as a Global Administrator:
|
||||||
|
|
||||||
|
{{CONSENT_URL}}
|
||||||
|
|
||||||
|
2. Review the screen (it will show **"ComputerGuru Tenant Admin"**) and click **Accept**.
|
||||||
|
|
||||||
|
That's it. Once you've accepted, reply to let us know and we'll finish the setup on our end.
|
||||||
|
|
||||||
|
Thanks,
|
||||||
|
{{TECH_NAME}}
|
||||||
|
Arizona Computer Guru · 520.304.8300
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Notes for the tech (not for the customer)
|
||||||
|
|
||||||
|
- The approver MUST be a **Global Administrator**. A User Admin / other role cannot grant
|
||||||
|
application admin consent — the Accept will fail or be greyed out.
|
||||||
|
- The single grant is for **ComputerGuru Tenant Admin** only. After they accept, run
|
||||||
|
`onboard365.sh provision <domain>` — that creates the other app SPs and assigns roles with no
|
||||||
|
further customer interaction.
|
||||||
|
- If they report an error like "Need admin approval" / "AADSTS650056" / "AADSTS7000229" on the
|
||||||
|
link, they almost always signed in with a non-GA account. Have them retry as a GA.
|
||||||
|
- Don't send the per-app links. One link is the whole point. Per-app fallback URLs only come into
|
||||||
|
play if `onboard-tenant.sh` reports a specific app failed programmatic consent.
|
||||||
113
.claude/skills/onboard365/scripts/onboard365.sh
Normal file
113
.claude/skills/onboard365/scripts/onboard365.sh
Normal file
@@ -0,0 +1,113 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# Onboard365 — single-consent onboarding of a customer M365 tenant to the
|
||||||
|
# ComputerGuru remediation app suite.
|
||||||
|
#
|
||||||
|
# The customer Global Admin consents ONCE to the "ComputerGuru Tenant Admin" app.
|
||||||
|
# Everything else (the other SPs, their Graph/EXO/Defender permissions, and the
|
||||||
|
# Entra directory roles) is provisioned programmatically by the reused
|
||||||
|
# remediation-tool/onboard-tenant.sh — no further customer clicks.
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# onboard365.sh <domain|tenant-id> # smart: link if not consented, else provision
|
||||||
|
# onboard365.sh link <domain|tenant-id> # print the ONE consent URL + customer steps
|
||||||
|
# onboard365.sh status <domain|tenant-id> # dry-run: show current consent/role state
|
||||||
|
# onboard365.sh provision <domain|tenant-id> # after consent: provision all apps + roles
|
||||||
|
#
|
||||||
|
# Exit codes mirror onboard-tenant.sh for provision/status (0 ok, 2 not consented,
|
||||||
|
# 10 partial). link always exits 0.
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
TENANT_ADMIN_APPID="709e6eed-0711-4875-9c44-2d3518c47063"
|
||||||
|
CONSENT_BASE="https://login.microsoftonline.com"
|
||||||
|
CONSENT_REDIRECT="https://azcomputerguru.com"
|
||||||
|
|
||||||
|
# ── Locate the reused remediation-tool scripts ────────────────────────────────
|
||||||
|
# Prefer the applied global copy (stable path on every fleet machine); fall back
|
||||||
|
# to the repo copy via identity.json.claudetools_root.
|
||||||
|
find_rtool() {
|
||||||
|
local cands=("$HOME/.claude/skills/remediation-tool/scripts")
|
||||||
|
local idf="$HOME/.claude/identity.json"
|
||||||
|
if [[ -f "$idf" ]] && command -v jq >/dev/null 2>&1; then
|
||||||
|
local root
|
||||||
|
root=$(jq -r '.claudetools_root // empty' "$idf" 2>/dev/null || true)
|
||||||
|
[[ -n "$root" ]] && cands+=("$root/.claude/skills/remediation-tool/scripts")
|
||||||
|
fi
|
||||||
|
local c
|
||||||
|
for c in "${cands[@]}"; do
|
||||||
|
[[ -f "$c/onboard-tenant.sh" ]] && { echo "$c"; return 0; }
|
||||||
|
done
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
RT="$(find_rtool)" || {
|
||||||
|
echo "[ERROR] remediation-tool scripts not found." >&2
|
||||||
|
echo " Expected: \$HOME/.claude/skills/remediation-tool/scripts/onboard-tenant.sh" >&2
|
||||||
|
echo " Run a repo sync, or check identity.json.claudetools_root." >&2
|
||||||
|
exit 3
|
||||||
|
}
|
||||||
|
|
||||||
|
# ── Parse args (allow a bare domain as smart mode) ────────────────────────────
|
||||||
|
SUB="${1:-}"
|
||||||
|
[[ -z "$SUB" ]] && { echo "usage: onboard365.sh <link|status|provision|auto> <domain|tenant-id>" >&2; exit 64; }
|
||||||
|
case "$SUB" in
|
||||||
|
link|status|provision|auto)
|
||||||
|
TARGET="${2:-}"
|
||||||
|
[[ -z "$TARGET" ]] && { echo "usage: onboard365.sh $SUB <domain|tenant-id>" >&2; exit 64; }
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
TARGET="$SUB"; SUB="auto"
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
resolve() { "$RT/resolve-tenant.sh" "$1" 2>/dev/null || echo "$1"; }
|
||||||
|
|
||||||
|
print_link() {
|
||||||
|
local t="$1"
|
||||||
|
cat <<EOF
|
||||||
|
============================================================
|
||||||
|
Onboard365 — Single-Consent Link
|
||||||
|
Customer: $TARGET
|
||||||
|
Tenant: $t
|
||||||
|
============================================================
|
||||||
|
Send the ONE link below to the customer's GLOBAL ADMIN. They sign
|
||||||
|
in and click Accept. That single consent is all they do — ACG
|
||||||
|
provisions everything else automatically.
|
||||||
|
|
||||||
|
${CONSENT_BASE}/${t}/adminconsent?client_id=${TENANT_ADMIN_APPID}&redirect_uri=${CONSENT_REDIRECT}&prompt=consent
|
||||||
|
|
||||||
|
App they will see: "ComputerGuru Tenant Admin"
|
||||||
|
(Customer instructions template: references/customer-consent-instructions.md)
|
||||||
|
|
||||||
|
After they confirm Accept, run:
|
||||||
|
onboard365.sh provision $TARGET
|
||||||
|
============================================================
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
# returns 0 if Tenant Admin token acquires (consented), 1 otherwise
|
||||||
|
is_consented() { "$RT/get-token.sh" "$1" tenant-admin >/dev/null 2>/tmp/onboard365-tok.err; }
|
||||||
|
|
||||||
|
case "$SUB" in
|
||||||
|
link)
|
||||||
|
print_link "$(resolve "$TARGET")"
|
||||||
|
;;
|
||||||
|
status)
|
||||||
|
echo "[INFO] Onboard365 status (dry-run) for $TARGET — no changes will be made"
|
||||||
|
"$RT/onboard-tenant.sh" "$TARGET" --dry-run
|
||||||
|
;;
|
||||||
|
provision)
|
||||||
|
echo "[INFO] Onboard365 provisioning suite for $TARGET (single-consent model)"
|
||||||
|
"$RT/onboard-tenant.sh" "$TARGET"
|
||||||
|
;;
|
||||||
|
auto)
|
||||||
|
T="$(resolve "$TARGET")"
|
||||||
|
if is_consented "$T"; then
|
||||||
|
echo "[INFO] Tenant Admin already consented for $TARGET — provisioning the suite..."
|
||||||
|
"$RT/onboard-tenant.sh" "$TARGET"
|
||||||
|
else
|
||||||
|
echo "[INFO] Tenant Admin not yet consented for $TARGET — generating the single consent link."
|
||||||
|
print_link "$T"
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
;;
|
||||||
|
esac
|
||||||
75
session-logs/2026-06/2026-06-10-mike-onboard365-skill.md
Normal file
75
session-logs/2026-06/2026-06-10-mike-onboard365-skill.md
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
# Onboard365 Skill — Single-Consent M365 Tenant Onboarding
|
||||||
|
|
||||||
|
## User
|
||||||
|
- **User:** Mike Swanson (mike)
|
||||||
|
- **Machine:** GURU-5070
|
||||||
|
- **Role:** admin
|
||||||
|
|
||||||
|
## Session Summary
|
||||||
|
|
||||||
|
Built a new `onboard365` skill (+ `/onboard365` command) that drives single-consent onboarding of a customer M365 tenant to the ComputerGuru remediation app suite. The motivating insight, discovered while reading the existing `remediation-tool` scripts, is that the single-consent model is already implemented inside `onboard-tenant.sh`: the customer Global Admin consents once to the Tenant Admin app (which holds Application.ReadWrite.All + AppRoleAssignment.ReadWrite.All + RoleManagement.ReadWrite.Directory), and the script then programmatically creates the other app service principals, grants all their Graph/EXO/Defender app-role assignments, and assigns the required Entra directory roles — no further customer clicks. Onboard365 is a clean, first-class front-end for that flow.
|
||||||
|
|
||||||
|
The skill was deliberately built as a thin orchestrator that reuses the existing remediation-tool scripts (resolve-tenant.sh, get-token.sh, onboard-tenant.sh) at their stable applied path ($HOME/.claude/skills/remediation-tool/scripts), with a repo fallback resolved via identity.json.claudetools_root. This avoids duplicating the role GUIDs, app IDs, and consent logic, which evolve over time and would otherwise drift between two copies.
|
||||||
|
|
||||||
|
Created four files in the repo (the synced source of truth that propagates to the fleet via sync.sh Phase 5c): the SKILL.md playbook, scripts/onboard365.sh (link/status/provision/smart-auto modes), references/customer-consent-instructions.md (a ready-to-send customer email template), and commands/onboard365.md (thin /onboard365 entry point). The script was made executable.
|
||||||
|
|
||||||
|
Validated end-to-end on this machine: `link` mode resolved grabblaw.com to its tenant GUID and produced the correct Tenant Admin consent URL; `status` (dry-run) ran the full reuse chain (resolve-tenant -> get-token tenant-admin -> onboard-tenant.sh --dry-run) against the already-onboarded grabblaw.com tenant and reported every role PRESENT/[OK] with no changes made, and correctly auto-skipped Defender (no MDE license). Both the skill and command registered (visible in the skill/command lists).
|
||||||
|
|
||||||
|
## Key Decisions
|
||||||
|
|
||||||
|
- Reuse, not duplicate: onboard365.sh calls the existing remediation-tool scripts rather than copying onboard-tenant.sh, so role GUIDs/app IDs/consent logic live in exactly one maintained place.
|
||||||
|
- Built in the REPO `.claude/skills/` (not the applied global `~/.claude/skills/`), because sync.sh Phase 5c copies repo skills one-way to global on every machine. The repo is the source of truth.
|
||||||
|
- Cross-skill script reference uses the stable applied path `$HOME/.claude/skills/remediation-tool/scripts` (present on every fleet machine after sync), with a repo fallback via identity.json.claudetools_root.
|
||||||
|
- Recording (tenant registry update) is a documented Claude step in SKILL.md rather than script logic — and it points at the REPO copy of remediation-tool/references/tenants.md (resolved via claudetools_root) so the Onboarded=YES update persists and syncs, instead of editing the ephemeral applied copy.
|
||||||
|
- Kept the skill scoped to provisioning the app suite only; user/CA/break-glass actions remain remediation-tool territory.
|
||||||
|
|
||||||
|
## Problems Encountered
|
||||||
|
|
||||||
|
- The remediation-tool scripts resolve their identity.json relative to the skill install dir, so on GURU-5070 they read the HOME identity (no vault_path) and fail token acquisition. Same known quirk as earlier this session; worked around with `export VAULT_ROOT_ENV=D:/vault` for the integration test. Documented the workaround in SKILL.md (Vault section).
|
||||||
|
|
||||||
|
## Configuration Changes
|
||||||
|
|
||||||
|
- Created `.claude/skills/onboard365/SKILL.md`
|
||||||
|
- Created `.claude/skills/onboard365/scripts/onboard365.sh` (chmod +x)
|
||||||
|
- Created `.claude/skills/onboard365/references/customer-consent-instructions.md`
|
||||||
|
- Created `.claude/commands/onboard365.md`
|
||||||
|
|
||||||
|
## Credentials & Secrets
|
||||||
|
|
||||||
|
- None created. The skill uses existing app credentials via the SOPS vault (msp-tools/computerguru-*.sops.yaml) through the reused get-token.sh.
|
||||||
|
- Tenant Admin app id referenced by the consent URL: `709e6eed-0711-4875-9c44-2d3518c47063` (public app id, not a secret).
|
||||||
|
|
||||||
|
## Infrastructure & Servers
|
||||||
|
|
||||||
|
- Consent URL pattern: `https://login.microsoftonline.com/<tenant>/adminconsent?client_id=709e6eed-0711-4875-9c44-2d3518c47063&redirect_uri=https://azcomputerguru.com&prompt=consent`
|
||||||
|
- App suite provisioned by onboard-tenant.sh: Security Investigator (bfbc12a4-…), Exchange Operator (b43e7342-…), User Manager (64fac46b-…), Tenant Admin (709e6eed-…), Defender Add-on (dbf8ad1a-…).
|
||||||
|
- Directory roles assigned: Sec Inv + Exch Op -> Exchange Administrator; User Manager -> User Administrator + Authentication Administrator; Tenant Admin -> Conditional Access Administrator. Defender = ATP API only (no directory role; MDE-licensed tenants only).
|
||||||
|
- Test tenant: grabblaw.com = `032b383e-96e4-491b-880d-3fd3295672c3` (already onboarded; used for read-only dry-run validation).
|
||||||
|
|
||||||
|
## Commands & Outputs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# usage
|
||||||
|
onboard365.sh <domain> # smart: link if not consented, else provision
|
||||||
|
onboard365.sh link <domain> # single Tenant Admin consent URL + customer steps
|
||||||
|
onboard365.sh status <domain> # dry-run consent/role state
|
||||||
|
onboard365.sh provision <domain> # provision all apps + roles after consent
|
||||||
|
|
||||||
|
# validation (this session)
|
||||||
|
bash -n onboard365.sh # syntax OK
|
||||||
|
onboard365.sh link grabblaw.com # resolved tenant + built URL
|
||||||
|
VAULT_ROOT_ENV=D:/vault onboard365.sh status grabblaw.com # all roles PRESENT/[OK], no changes
|
||||||
|
```
|
||||||
|
|
||||||
|
## Pending / Incomplete Tasks
|
||||||
|
|
||||||
|
- The four new files are committed via this save's sync (see Post-commit). They propagate to other fleet machines' ~/.claude/skills on their next sync (Phase 5c).
|
||||||
|
- Not yet exercised against a brand-new (un-consented) tenant end-to-end — provision path is the same onboard-tenant.sh already in production use, but a real first-time onboard would be the final confidence check.
|
||||||
|
- Fleet quirk still open: remediation-tool scripts need VAULT_ROOT_ENV or a vault_path fix in the home identity.json on GURU-5070.
|
||||||
|
|
||||||
|
## Reference Information
|
||||||
|
|
||||||
|
- Skill dir: `.claude/skills/onboard365/`
|
||||||
|
- Command: `.claude/commands/onboard365.md` (`/onboard365`)
|
||||||
|
- Reused scripts: `.claude/skills/remediation-tool/scripts/{resolve-tenant,get-token,onboard-tenant,assign-exchange-role}.sh`
|
||||||
|
- Tenant registry (update Onboarded=YES here, repo copy): `.claude/skills/remediation-tool/references/tenants.md`
|
||||||
Reference in New Issue
Block a user