sync: auto-sync from HOWARD-HOME at 2026-07-02 18:33:52

Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-07-02 18:33:52
2026-07-02 18:35:56 -07:00
parent 7300f57a47
commit d745c7d40f
2 changed files with 90 additions and 0 deletions
--- a/clients/cascades-tucson/session-logs/2026-07/2026-07-02-howard-caretaker-phone-login-pso-roster-cleanup.md
+++ b/clients/cascades-tucson/session-logs/2026-07/2026-07-02-howard-caretaker-phone-login-pso-roster-cleanup.md
@@ -0,0 +1,84 @@
 ## User
 - **User:** Howard Enos (howard)
 - **Machine:** Howard-Home
 - **Role:** tech
 ## Session Summary
 Started from a single helpdesk report: caretaker Agnes McFerren (`a.mcferren@cascadestucson.com`) could not log into the shared phone. Diagnosis traced it to a forced-change password state, not a Conditional Access, MFA, license, or location problem. Her Entra account was healthy (enabled, SPB-licensed, in `SG-Caregivers`, US, hybrid-synced) but her live sign-in log showed three failures via Microsoft Authentication Broker with `err=50126` (invalid credentials), and on-prem AD showed `pwdLastSet=0` / `PasswordExpired=True` (must-change-at-next-logon). A shared Android phone signs in through the Auth Broker, which cannot render the "set a new password" screen, so any must-change account fails on the phone and surfaces as 50126.
 Established this affected the entire cohort, not just Agnes: all 35 `SG-Caregivers` members were stuck on the 6/30 bulk-onboarding forced-change flag. Cleared it fleet-wide with `Set-ADUser -ChangePasswordAtLogon $false` on CS-SERVER (35/35 cleared, passwords unchanged). Verified the fix end-to-end: a ROPC credential test from off-network returned `53003` (Conditional Access block) for Agnes and all 35 — proving the password is now accepted by Entra and the off-network location block works as designed. Ran a comprehensive verification pass: domain password policy, per-user expiry state, temp-password-to-AD match for all 35 (all `53003`, zero mismatches), `SG-Caregivers` on-prem (35) vs cloud (35, all synced+enabled), Entra Connect sync freshness, and the caregiver GPO/CA posture.
 Found and closed a password-expiry time bomb: none of the 35 were set to never-expire, and domain `MaxPasswordAge=42d` would have re-expired the just-reset passwords ~2026-08-13, re-breaking phone login. Created a Fine-Grained Password Policy `PSO-Caregivers` (precedence 10, `MaxPasswordAge=00:00:00`, all else mirrors the Default Domain Policy) applied to `SG-Caregivers`. Verified `Get-ADUserResultantPasswordPolicy` returns `PSO-Caregivers` for the cohort. Rotation model is on-event, not scheduled (NIST 800-63B aligned; schedule rotation is impossible on shared phones).
 Investigated the phone security posture for the "approved phones + onsite only" goal. The onsite-only location block (`e35614e1`) and 8h sign-in frequency (`7d491c7a`) are enforced for `SG-Caregivers`; the device allow-list (`1b7fd025`, blocks any non-`CSC-*` device) is correctly built but still scoped to the test group. Inventoried 24 `CSC-*` Entra devices / 25 Intune Android devices — all present, so flipping the allow-list to `SG-Caregivers` would be safe. User elected to HOLD the device-lock flip for now (caretakers remain onsite-only, any-device).
 Processed a client roster update. Of 8 names to remove, 7 were already offboarded in the 7/1 reconcile (disabled, out of `SG-Caregivers`, never logged in); per user direction they were hard-deleted from AD (verified gone on-prem and, on next Connect sync, HTTP 404 in the cloud). The 8th, Juan Andrade (`j.andrade`), gave 2-week notice and is still working — held active, offboarding scheduled for his last day 2026-07-11 (coord todo + doc/wiki tracking). Documented everything in the Cascades wiki and caretaker tracking list; committed and pushed.
 ## Key Decisions
 - Fixed the login by clearing the must-change flag (`-ChangePasswordAtLogon $false`) rather than resetting passwords — keeps the vaulted temp passwords valid and requires no re-delivery to caretakers.
 - Swept all 35 `SG-Caregivers` at once (not just Agnes) because the root cause was a cohort-wide onboarding setting.
 - Used a ROPC credential test as the verification method: from off-network a `53003` result proves the password is valid (credential checked before CA), giving per-user confirmation without touching the phones.
 - Chose a Fine-Grained Password Policy over per-user `PasswordNeverExpires` — group-scoped, auto-covers future hires, one auditable object; mirrored the domain policy exactly except expiry.
 - Set caregiver passwords to never expire with on-event rotation, justified by compensating controls (onsite-only CA, device allow-list, 8h reauth, no MFA/personal device, auto-logoff) and NIST 800-63B guidance against forced periodic rotation.
 - Held the device allow-list flip per user; recommended a 24h report-only pass before enforcing when the time comes.
 - Held Juan Andrade active (did not disable on notice) to avoid cutting off a working employee; scheduled for last day.
 - Hard-deleted the 7 leavers per explicit user choice (my default recommendation was keep-disabled for HIPAA retention); delete script guarded to refuse any `Enabled=True` account.
 ## Problems Encountered
 - `get-token.sh` failed with "vault_path not set" — the script reads `~/.claude/identity.json` (absent on this machine); the real identity.json is repo-local with `vault_path: D:/vault`. Worked around with `VAULT_ROOT_ENV=/d/vault` inline. (Friction: env resolution mismatch.)
 - Graph `auditLogs/signIns` `$filter` (by userId or UPN) hangs and times out (HTTP 000) on this tenant; large `$top` pages also hang. Resolved by pulling a small unfiltered page (`$top=50`) and filtering client-side.
 - `iconv` is absent in this Git Bash, so `ps-encoded.sh` could not build the UTF-16LE `-EncodedCommand`. Generated the base64 with Python (`base64.b64encode(text.encode('utf-16-le'))`) instead.
 - `/tmp` writes are blocked by the `block-tmp-path` hook (Windows path-mismatch rule); used repo-relative temp files.
 - First `git push` was rejected (fleet auto-sync advanced the remote); resolved with `git pull --rebase` then push.
 ## Configuration Changes
 - **AD (CS-SERVER, `cascades.local`):**
  - Cleared `ChangePasswordAtLogon` on all 35 `SG-Caregivers` members (RMM cmd `45660cd8`).
  - Created FGPP `PSO-Caregivers` applied to `SG-Caregivers` (RMM cmd `a1c2e30b`).
  - Hard-deleted 7 disabled leaver accounts (RMM cmd `a5f337fd`): `b.mendoza`, `c.tate`, `g.williford`, `k.flores`, `d.fierros`, `m.baker`, `m.kariuki`.
 - **Files modified:**
  - `wiki/clients/cascades-tucson.md` — added forced-change gotcha section, PSO-Caregivers section, 7/2 roster-status block; corrected stale "forced-change temp passwords" phrasing.
  - `clients/cascades-tucson/docs/cloud/caretaker-phones-only-list.md` — annotated Juan Andrade row (leaving 2026-07-11).
 - **Files created:** this session log.
 - **No CA policy changes made** (device-lock flip held per user).
 ## Credentials & Secrets
 - No new credentials created. Agnes's temp password (`Meadow8541@`) and all caregiver temp passwords remain vaulted at `clients/cascades-tucson/caregiver-temp-passwords-2026-06-30.sops.yaml` and `-2026-07-01.sops.yaml` (retrieve with full `vault get`, not `get-field` — dotted keys). Verified all 35 vaulted temp passwords match live AD.
 ## Infrastructure & Servers
 - **M365 tenant (Cascades):** `207fa277-e9d8-4eb7-ada1-1064d2221498` (`cascadestucson.com`).
 - **CS-SERVER** (DC/DNS/DHCP/File/Hyper-V/Print): GuruRMM agent `c39f1de7-d5b6-45ae-b132-e06977ab1713`. Runs AD PowerShell as SYSTEM (sufficient for AD writes incl. PSO creation and user deletion).
 - **`SG-Caregivers` cloud group:** `8b8d9222-5d71-419a-936d-56d895c6c332` (35 members, all synced + enabled).
 - **Named location "Cascades" (onsite):** `061c6b06-b980-40de-bff9-6a50a4071f6f`, trusted, IPs `72.211.21.217/32`, `184.191.143.62/32`.
 - **Caregiver CA policies:** `e35614e1` off-network block (enabled), `ede985e2` compliance-block (disabled, superseded), `7d491c7a` 8h sign-in frequency (enabled), `1b7fd025` allow-listed-devices-only (enabled, scoped to test group `db5849ec`), `7e87a1c7` Require-MFA-all (SG-Caregivers excluded). Break-glass group `SG-CA-BreakGlass` = `131e51ac`; device test group `SG-Caregivers-DeviceTest` = `db5849ec`.
 - **Approved phones:** 24 `CSC-*` Entra devices (AndroidEnterprise, Workplace-joined), 25 Intune Android managed devices (company-owned, mostly Intune-noncompliant — irrelevant to the name-based allow-list). Entra Connect: syncEnabled, last dir sync `2026-07-02T20:19:46Z`.
 ## Commands & Outputs
 - Diagnose login (sign-in log): `err=50126` (invalid username/password) x3 via Microsoft Authentication Broker.
 - Verify fix (ROPC): `POST login.microsoftonline.com/<tenant>/oauth2/v2.0/token grant_type=password client_id=1950a258-227b-4e31-a9cf-717495945fc8` → `AADSTS53003` (CA block) = password valid + onsite-only working.
 - Must-change sweep: `SUMMARY total=35 cleared=35 already_ok=0 disabled=0`.
 - PSO verify: `Get-ADUserResultantPasswordPolicy` → `PSO-Caregivers  MaxPasswordAge=00:00:00` for sampled caretakers.
 - Delete verify: all 7 `gone`; cloud `GET /users/<upn>` → HTTP 404 for all 7.
 - `-EncodedCommand` build (iconv-free): `py -c "import base64;print(base64.b64encode(open(f,encoding='utf-8').read().encode('utf-16-le')).decode())"`.
 ## Pending / Incomplete Tasks
 - **Juan Andrade (`j.andrade`) offboarding — 2026-07-11** (his last day): disable AD account on CS-SERVER + remove from `SG-Caregivers` + reclaim Business Premium. Coord todo `14881da2-8acd-4825-a964-bf9cdd2df876`. Held active until then.
 - **Approved-phones-only device lock — HELD** (user decision). To enforce later: clone `1b7fd025` to report-only targeting `SG-Caregivers`, watch sign-in logs ~24h, then add `SG-Caregivers` to the enabled policy's include groups. Break-glass already excluded.
 - **New-phone onboarding note:** a new shared phone must be enrolled/registered as `CSC-*` before use under the device lock (chicken-and-egg with the allow-list).
 - Prior open items unchanged: ALIS Email=UPN matching (Howard), NURSESTATION reboot to verify lockdown GPO, ALIS app timeout 20->15.
 ## Reference Information
 - Commit (docs/wiki this session): `b01105d` on `main` (rebased past fleet auto-sync).
 - RMM base: `http://172.16.3.30:3001`. RMM cmd IDs: sweep `45660cd8`, AD/SG/GPO verify `198490b7`, PSO create `a1c2e30b`, offboard-list verify `f36f1e36`, delete-leavers `a5f337fd`.
 - ROPC test client id (Azure PowerShell public client): `1950a258-227b-4e31-a9cf-717495945fc8`.
 - Error codes: `AADSTS50126` invalid credentials, `AADSTS53003` blocked by CA, `AADSTS50055` password expired.
 - Coord todo: `14881da2-8acd-4825-a964-bf9cdd2df876`.
--- a/errorlog.md
+++ b/errorlog.md
@@ -19,6 +19,12 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure ·
 <!-- Append entries below this line -->
 2026-07-03 | Howard-Home | remediation-tool/graph | [friction] auditLogs/signIns $filter (userId/UPN) hangs HTTP000 on Cascades tenant; large $top also hangs. Fix: small unfiltered $top=50, filter client-side
 2026-07-03 | Howard-Home | rmm/ps-encoded | [friction] iconv missing in Git Bash so ps-encoded.sh cannot build UTF-16LE -EncodedCommand. Fix: py -c base64.b64encode(text.encode('utf-16-le'))
 2026-07-03 | Howard-Home | remediation-tool/get-token | [friction] get-token.sh reads ~/.claude/identity.json (absent here); repo identity.json has vault_path=D:/vault. Fix: prefix VAULT_ROOT_ENV=/d/vault [ctx: ref=vault-skill-gotchas]
 2026-07-03 | GURU-5070 | rmm/dispatch | [friction] embedded backslash PS in jq literal program twice this session; jq treats S etc as invalid escapes - ALWAYS use ps-encoded.sh or jq --arg for any command containing backslashes [ctx: ref=feedback_windows_quote_stripping + rmm skill quote-safe dispatch rule]
 2026-07-03 | GURU-5070 | rmm/pst-context | [correction] assumed PST-SERVER2 might still be a live DC needing demote at cutover; correct is it died permanently (~mid-June, per Mike, stated previously) and PST-DC-NW is its replacement