Session log: Cascades CA bypass phased rollout + pilot user + phone re-enroll

Cascades caregiver shared-phone bypass pilot — 2026-04-29 evening into 2026-04-30 early morning continuation. Major work: - Adopted phased per-group CA rollout (corrects original tenant-wide §5 design that would have blocked off-site office users) - Step A: backfilled admin@ into excludeUsers on all 8 existing Cascades CA policies (mirrors sysadmin@ exclusion posture; Option 1 break-glass) - Outlook + Helpany + LinkRx assigned to Cascades - Shared Phones group and added to MHS kiosk app list (final dashboard: 5 caregiver apps) - Created cloud-only pilot user pilot.test@cascadestucson.com, SG-Caregivers-Pilot group, Business Premium license, vault entry pushed to Gitea vault repo - Built 4 CA changes: PATCH legacy all-users-MFA to exclude pilot group, CREATE 3 new Report-only policies (block off-network, block non-compliant, 8h sign-in frequency) with both admins excluded - Pilot phone wipe + re-enroll after first attempt stuck; PIN set, awaiting MHS to take over launcher and SDM sign-in prompt 6 new project/feedback memories. Resume point at top of new session log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 10:57:16 -07:00
parent 7128b9e57d
commit 18e5a467d2
8 changed files with 395 additions and 22 deletions
--- a/.claude/memory/feedback_complete_vault_operations_end_to_end.md
+++ b/.claude/memory/feedback_complete_vault_operations_end_to_end.md
@@ -0,0 +1,20 @@
+---
+name: Complete vault operations end-to-end (don't hand off the commit/push)
+description: When writing a new entry to D:/vault, do the full sequence (write plaintext → sops -e -i → git add/commit/push) yourself. Don't stop at "encrypted on disk, you push it."
+type: feedback
+---
+
+When the user asks to vault a credential or any new vault entry, complete the entire operation in one flow:
+
+1. Write the plaintext yaml to `D:/vault/<path>.sops.yaml`
+2. `sops -e -i <path>` to encrypt in place
+3. Verify round-trip (`vault.sh get` shows correct decrypted output)
+4. `git add` + `git commit` + `git push` from `D:/vault` via the Bash tool
+
+**Why:** Howard explicitly flagged on 2026-04-29 that he doesn't understand why I'd hand off the trivial last-mile step. He has bash via Git for Windows but invokes from PowerShell, so a "run this bash one-liner" handoff costs him a context switch — and there's no privilege/risk reason to stop at "encrypted on disk." Pushing a clean SOPS-encrypted vault entry is routine, not destructive.
+
+**How to apply:**
+- Just push. Trust the encrypted blob, the round-trip verify, and the standard git workflow.
+- If `git push` fails (auth, conflict, etc.), surface the error and ask — that's a real handoff. But "I created the file, you push it" is unnecessary friction.
+- The LF→CRLF warning on Windows is benign for SOPS yaml — line endings on the yaml file don't affect SOPS integrity (the MAC covers values inside `ENC[...]` blobs and structural data). Don't surface it as a problem.
+- Same principle applies to commits in the claudetools repo when I'm done with a discrete unit of work — don't park "you should /scc this" as a task; just do it (unless the user has explicitly said wait).
--- a/.claude/memory/feedback_graph_ca_policy_eventual_consistency.md
+++ b/.claude/memory/feedback_graph_ca_policy_eventual_consistency.md
@@ -0,0 +1,17 @@
+---
+name: Microsoft Graph CA policy reads are eventually consistent (~5s)
+description: After PATCHing a CA policy (204 No Content), an immediate GET may return stale state. Wait ~5 seconds before verifying.
+type: feedback
+---
+
+When PATCHing `/identity/conditionalAccess/policies/{id}` and immediately re-reading via GET, the read may return pre-PATCH state for a few seconds even though the PATCH was accepted (204).
+
+Observed 2026-04-29 during the Cascades admin@ exclusion backfill: 7 of 8 PATCHes returned 204, but immediate verify GETs showed the old `excludeUsers` list. Re-query after `sleep 5` showed all 8 had landed correctly. No retries were needed — the PATCH had succeeded; only the read lagged.
+
+**Why:** Microsoft Graph fronts CA policy reads through a regional cache that doesn't immediately reflect writes. Writes hit the authoritative store and return 204 right away. Reads converge after a short propagation window.
+
+**How to apply:**
+- After a CA policy PATCH that returns 204, do not treat an immediate "verify mismatch" as failure.
+- Insert `sleep 3-5` (or a poll loop with a few seconds of backoff) before the verify GET.
+- If verifying many policies in a batch, the simplest pattern is: do all PATCHes, sleep 5, then re-query everything once at the end.
+- This applies to CA policies specifically. Other Graph endpoints (e.g., users, groups) have their own consistency characteristics — don't generalize.
--- a/.claude/memory/feedback_graph_password_reset_requires_role.md
+++ b/.claude/memory/feedback_graph_password_reset_requires_role.md
@@ -0,0 +1,17 @@
+---
+name: Tenant Admin SP cannot PATCH-reset existing user passwords (app perms ≠ enough)
+description: With User.ReadWrite.All app perm + no privileged directory role, Tenant Admin can CREATE a user with a password but PATCH passwordProfile on an existing user returns 403 Authorization_RequestDenied.
+type: feedback
+---
+
+The ComputerGuru Tenant Admin SP (`709e6eed-0711-4875-9c44-2d3518c47063`) can create users with `passwordProfile.password` set, but cannot **reset** the password on an existing user via PATCH `/users/{id}` — returns 403 `Authorization_RequestDenied: Insufficient privileges`.
+
+Observed 2026-04-29 in Cascades when trying to reset `pilot.test@cascadestucson.com` after the password was lost in script flow control.
+
+**Why:** Microsoft Graph's password reset endpoint requires the caller to hold a privileged directory role (Authentication Administrator, User Administrator, or stronger), in addition to `User.ReadWrite.All` app permission. App permission alone is insufficient. Tenant Admin SP currently has Application Administrator + Cloud Application Administrator + Conditional Access Administrator — none of which grant password-reset rights. The CREATE flow is permitted under `User.ReadWrite.All` because the password is part of the create payload, not a reset.
+
+**How to apply:**
+- For new pilot/test users: print the password BEFORE doing any subsequent API call, so a flow-control failure later doesn't lose it.
+- If a password rotation is needed for an existing pilot/test user: delete + recreate (cleanest), OR have a human use admin@/sysadmin@ via the portal, OR use the ComputerGuru User Manager app (separate tier with dedicated `User-PasswordProfile.ReadWrite.All` scope, designed for this).
+- **Don't** add Authentication Administrator or User Administrator to Tenant Admin SP just to fix this — that broadens its blast radius unnecessarily. The User Manager app is the right tool for password operations; Tenant Admin should stay focused on directory + CA work.
+- Hard-delete a freshly-created user via DELETE `/users/{id}` — the recycle bin endpoint `/directory/deletedItems/{id}` may 404 if the user wasn't soft-deleted (depends on tenant settings + age).
--- a/.claude/memory/project_cascades_admin_accounts.md
+++ b/.claude/memory/project_cascades_admin_accounts.md
@@ -0,0 +1,16 @@
+---
+name: Cascades admin account ownership
+description: Howard uses sysadmin@cascadestucson.com, Mike uses admin@cascadestucson.com — used for daily admin work, not break-glass.
+type: project
+---
+
+At Cascades Tucson tenant (`207fa277-e9d8-4eb7-ada1-1064d2221498`):
+
+- **`sysadmin@cascadestucson.com`** — Howard's working admin account (used the PIM portal click on 2026-04-28 for the CA Admin role assignment).
+- **`admin@cascadestucson.com`** — Mike's working admin account.
+
+As of 2026-04-29, neither is confirmed as cloud-only / FIDO2 / CA-excluded — Howard "doesn't think they are cloud-only." A break-glass admin still needs to be designed before the CA bypass policies go live.
+
+**Why:** Avoid asking who owns which admin login again, and keep clear that these are *daily-driver* admin accounts, not the eventual break-glass.
+
+**How to apply:** When discussing Cascades admin work or break-glass design, attribute correctly. Don't assume sysadmin@ or admin@ already meet break-glass criteria — verify against Graph (onPremisesSyncEnabled, authentication methods, CA exclusions) before relying on either.
--- a/.claude/memory/project_cascades_ca_phased_rollout.md
+++ b/.claude/memory/project_cascades_ca_phased_rollout.md
@@ -0,0 +1,26 @@
+---
+name: Cascades CA bypass — phased per-group rollout, NOT tenant-wide
+description: Caregiver bypass CA policies are scoped to SG-Caregivers-Pilot only at start, then expanded one department at a time. Legacy all-users-MFA stays in place; we PATCH excludeGroups, never delete it during rollout.
+type: project
+---
+
+The Cascades caregiver bypass CA work is a **phased rollout**, not a tenant-wide policy swap. This corrects the original §5 design in `clients/cascades-tucson/docs/cloud/user-account-rollout-plan.md` and the resume-point in `2026-04-29-howard-cascades-bypass-pilot-phase-b-buildout.md`, which both implied a tenant-wide cutover.
+
+**What this means concretely:**
+
+- New CA policies target `SG-Caregivers-Pilot` only (then `SG-Caregivers` after Entra Connect exits staging). They do NOT use `includeUsers: All`.
+- The legacy `Require multifactor authentication for all users` policy **stays in place**. We PATCH its `excludeGroups` to add the pilot group, so existing office-staff behavior is unchanged.
+- Expansion to additional populations (front desk, clinical, admin staff) happens one group at a time post-pilot — each with its own scoped policy set, each by editing `excludeGroups` on the legacy policy and adding `includeGroups` to the relevant new policies.
+- The legacy all-users-MFA policy is ONLY deleted at the very end, when every population is governed by a phased policy.
+
+**Why:** Howard pulled the brakes on 2026-04-29 after spotting that policies #1, #2, #3 in the original design hit all users — would have blocked any office user signing in off-site who wasn't in `SG-External-Signin-Allowed`. The btw replay he pasted contained the correct rescoping: "Re-scope the new policies so they only target the pilot group initially, and roll out to other groups one at a time later." Phased preserves today's behavior for everyone except the pilot group while we validate the bypass mechanics.
+
+**How to apply:** When building or modifying Cascades CA policies, default to group-scoped (`includeGroups`), never `includeUsers: All`. When expanding to a new department, the steps are: (1) create the department's group, (2) PATCH legacy all-users-MFA to add it to `excludeGroups`, (3) add it to `includeGroups` on the relevant new policies. Treat any "let's just push it tenant-wide now that the pilot worked" suggestion as a regression of this decision and flag it.
+
+**Caregiver set (the only set in scope today):**
+- PATCH `Require multifactor authentication for all users`: add `SG-Caregivers-Pilot` to excludeGroups.
+- CREATE `CSC - Block caregivers off Cascades network` (includeGroups: pilot, locations: not Cascades, grant: BLOCK).
+- CREATE `CSC - Block caregivers on non-compliant device` (includeGroups: pilot, device filter isCompliant -eq False, grant: BLOCK).
+- CREATE `CSC - Caregiver sign-in frequency 8h` (includeGroups: pilot, session control: 8h re-auth).
+
+Note: for caregivers we use **Block** directly on non-compliant + off-network, not "Require MFA" — caregivers can't satisfy MFA (no personal device), so block is the cleaner UX. For non-caregiver populations later, MFA grants will likely be appropriate since office staff have MFA capability.
--- a/.claude/memory/project_cascades_pilot_cleanup.md
+++ b/.claude/memory/project_cascades_pilot_cleanup.md
@@ -0,0 +1,15 @@
+---
+name: Cascades caregiver pilot — cleanup obligations
+description: Pilot accounts (pilot.test@, howard.enos@ once synced) at Cascades must be removed at end of caregiver bypass pilot.
+type: project
+---
+
+The Cascades caregiver shared-phone bypass pilot (Path B, cloud-only) is using a temporary pilot identity. Howard explicitly flagged on 2026-04-29 that **all pilot artifacts must be cleaned up** when the pilot wraps:
+
+- **`pilot.test@cascadestucson.com`** — cloud-only test user created for the pilot. Delete (or disable + remove license) post-pilot.
+- **`howard.enos@cascadestucson.com`** — Howard's eventual synced identity (won't exist as a cloud user until Entra Connect exits staging). If used during pilot validation, also clean up after.
+- `SG-Caregivers-Pilot` cloud Entra group — superseded by synced `SG-Caregivers` group post-staging-exit. Remove pilot group from CA policy targets at that point; group itself can be deleted after.
+
+**Why:** Howard explicitly flagged on 2026-04-29 that pilot accounts must not stick around — clean tenant hygiene + license recovery (Business Premium seat returned to the 34-spare pool).
+
+**How to apply:** When the pilot validates and we transition to production rollout (synced `SG-Caregivers`), the cleanup of pilot.test, howard.enos pilot usage, and SG-Caregivers-Pilot is part of the cutover, not a separate task to forget. Surface this checklist when we get to the "flip pilot CA policies to production" step.