sync: auto-sync from HOWARD-HOME at 2026-07-02 14:48:25

Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-07-02 14:48:25
2026-07-02 14:49:01 -07:00
parent 5008ad79dc
commit 42da3cfcca
3 changed files with 29 additions and 1 deletions
--- a/clients/scileppi-law/session-logs/2026-07/2026-07-02-howard-scileppi-internet-outage-cox-wan-rca.md
+++ b/clients/scileppi-law/session-logs/2026-07/2026-07-02-howard-scileppi-internet-outage-cox-wan-rca.md
@@ -89,3 +89,16 @@ Finally, per the user's request, ran an independent re-gather on the **Fable 5 m
 - **Decision:** findings went on the existing #32493 (Howard's pick) rather than a new ticket.
 - **Friction:** first comment-POST attempt was blocked by the `block-tmp-path` PreToolUse hook (a stray `/tmp/.rr` write in the same command) — the hook blocked the whole command pre-execution, so no duplicate comment was created; re-ran cleanly. No errorlog entry (handled, no functional failure).
 - **speach skill:** no skill named "speach"/"speech" exists; interpreted as the writing-polish pass and applied `stop-slop` to the customer text.
+
+## Update: 14:48 PT — Cox re-check after their "150-ping clean" claim; Cox now running a 24h test
+
+Cox ran a ~150-ping test and reported it "all clean." Re-checked the network from both angles:
+
+- **Direct in-site test (SL-SERVER):** 250 pings each to the Cox CMTS (`10.68.48.1`, hop 2), a Cox backbone hop (`68.105.30.142`), and `8.8.8.8` — **0% loss on all three**, steady latency (~8 ms to CMTS, ~24 ms to internet). Matches Cox's clean result — because the fault is intermittent; a ~50 s / 2.5-min burst lands in a clean window and misses it. (First attempt ran 3x300 sequential and hit the agent's command timeout; re-ran as 3 parallel 250-ping jobs via `/dev/shm`, ~50 s wall.)
+- **UCG 24/7 telemetry (authoritative):** `/ea/isp-metrics/5m` last 24h = **78 of 257 windows (~30%) with WAN packet loss, 1-4%**, spread across the whole day, continuing to the latest samples (4% ~12:05 PM, 3% ~12:15 PM Phoenix). `internetIssues5min` also shows a fresh `not_reported` telemetry-gap blip ~2:05 PM Phoenix (bucket 5943421) — i.e. AFTER Cox's clean test. **No hard WAN outage in the last 24h** (this morning's ~20 s drop aged out of the window); chronic low-grade loss persists.
+- **Conclusion:** Cox's short test does not clear the line — it just didn't sample a loss window. A 2.5-min ping can't detect a fault present in ~30% of 5-min windows at 1-4%.
+- **Recommendation given to Howard:** have Cox pull the modem's DOCSIS signal history + correctable/uncorrectable codeword counts + SNR over 24-48h (where chronic low-grade loss shows), or run 24h+ continuous monitoring — not a live burst.
+- **Status:** **Cox is now running a 24-hour ping test and will report back.** Awaiting their result. Cox ticket `HD0000032972890` still open.
+- **Cosmetic:** UniFi ISP label flip-flopped back to "Comcast Cable / AS7922" this pull (was Cox Business/AS22773 earlier) — still Cox; label is unreliable.
+
+**Outstanding (offered, not yet done):** compile the exact per-window loss timestamps (Phoenix) from the UCG telemetry into a list for Cox to correlate against their modem logs; optionally add this re-check to Syncro #32493 + the wiki. Neither done yet — awaiting Howard's go and the Cox 24h result.
--- a/errorlog.md
+++ b/errorlog.md
@@ -19,6 +19,10 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure ·

 <!-- Append entries below this line -->

+2026-07-02 | Howard-Home | ps-encoded | encode produced empty output [ctx: src=C:/Users/Howard/AppData/Local/Temp/claude/C--claudetools/48ddc9ec-86ca-4dc1-93b2-529186a2623f/scratchpad/clear-caregiver-mustchange.ps1]
+
+2026-07-02 | Howard-Home | ps-encoded | encode produced empty output [ctx: src=.claude/scratch-iso-verify.ps1]
+
 2026-07-02 | Howard-Home | unifi/site-manager-api | [friction] vault infrastructure/unifi-site-manager-api key returns 401 (stale/rotated); the WORKING cloud key is services/unifi-site-manager (X-API-KEY vs api.ui.com) [ctx: ref=uos-server wiki; use services/unifi-site-manager]

 2026-07-02 | GURU-BEAST-ROG | self-check/registry-trim | [friction] trimmed skill registry locally while GURU-5070 shipped the same trim upstream; auto-sync merge raced my uncommitted edits (transient UU state, stale 15777 reading mid-merge); fix: check coord / claim a lock before fleet-wide harness edits [ctx: ref=coord-locks]
--- a/wiki/clients/cascades-tucson.md
+++ b/wiki/clients/cascades-tucson.md
@@ -156,6 +156,17 @@ The caregiver phone-SSO onboarding was executed 2026-06-30. To silently SSO into
  - **Christine Nyanzunda reaffirmed OUT of `SG-Caregivers`** despite the client's list naming her as a caretaker -- Howard held the frontline-only rule (established 2026-06-30). She keeps her existing `christine.nyanzunda` account with its broader (admin-adjacent) access.
  - **Caregiver phone/desktop login verified end-to-end and two CA blockers fixed** (Howard's go). Root cause: `pilot.test` had only ever worked because it sits in `SG-Caregivers-DeviceTest`/`-Test`, which are excluded from the compliance-block and targeted by the allow-list -- live caretakers hit both problems at once. **Fix 1:** `Require MFA for all users` (`7e87a1c7`) excluded only the stale `SG-Caregivers-Pilot` group -- added the live `SG-Caregivers` (`8b8d9222`) to `excludeGroups` (break-glass `excludeUsers` preserved). **Fix 2:** `CSC - Block caregivers on non-compliant device` (`ede985e2`) was blocking every caretaker device because the CSC-* phones report Intune-noncompliant (no Windows device is Intune-managed either) -- **disabled 2026-07-01, do not re-enable, superseded by the allow-list at final lockdown.** **Interim posture:** all caretakers may sign in on desktops AND phones, on-network only (`e35614e1` off-network block + `7d491c7a` 8h sign-in frequency remain enforced); the device allow-list (`1b7fd025`) stays scoped to the TEST group. This supersedes the 2026-06-24 "stay TEST-scoped, do not flip lockdown until all devices are domain-ready" decision. Phones-only lockdown is deferred to the end of the rollout -- tracking list `docs/cloud/caretaker-phones-only-list.md` (per the 4/22 staff CSV every caretaker is currently `D+P`; the phones-only cohort is TBD with the client).

+### Caregiver phone login -- forced-change-at-next-logon gotcha (RESOLVED 2026-07-02)
+**Symptom:** a caregiver cannot sign in on the shared Samsung phone; Entra sign-in log shows `Microsoft Authentication Broker` failing with **`err=50126` ("invalid username or password")** -- but the account is enabled, SPB-licensed, in `SG-Caregivers`, and on-network. **Root cause:** the 6/30 (+7/1) bulk onboarding created every caregiver temp password with **"User must change password at next logon"** (`pwdLastSet=0`, `PasswordExpired=True`). A shared Android phone signs in through the Auth Broker, which **cannot render the forced "set a new password" screen**, so the sign-in dies and surfaces as `50126` even when the correct temp password is typed. This is NOT a CA/MFA/license/location problem.
+- **Fix applied 2026-07-02 (Howard):** swept **all 35 `SG-Caregivers`** on CS-SERVER with `Set-ADUser <sam> -ChangePasswordAtLogon $false` (RMM cmd `45660cd8`). Every one was stuck (`cleared=35 already_ok=0`). This clears the must-change flag and un-expires the password **without changing it** -- the vaulted temp passwords (`clients/cascades-tucson/caregiver-temp-passwords-2026-06-30` + `-2026-07-01`) are now valid for phone sign-in; PHS syncs the state to Entra in ~2 min. Diagnosis first surfaced on Agnes McFerren (`a.mcferren`, temp `Meadow8541@`).
+- **RULE for future caregiver onboarding:** phone-only/shared-device caregivers must be created with **`-ChangePasswordAtLogon $false`** (accept the risk of a static temp password, or have them complete the one-time change on a domain desktop **before** touching the phone). Never leave forced-change set on a caregiver who will sign in on a shared phone -- it always fails as `50126`. Diagnose via the `remediation-tool` sign-in logs (`$filter` on the signIns endpoint hangs on this tenant -- pull a small unfiltered page `?$top=50` and grep client-side) + `pwdLastSet`/`PasswordExpired` on CS-SERVER.
+
+### Caregiver password expiry -- `PSO-Caregivers` (never-expire, created 2026-07-02)
+**Why it exists:** caregiver temp passwords are used on shared phones that cannot run the "must change password" flow, so ANY password expiry re-breaks phone login (see the forced-change gotcha above). To stop the 42-day domain `MaxPasswordAge` from re-expiring the whole cohort (~2026-08-13), a **Fine-Grained Password Policy `PSO-Caregivers`** was created on CS-SERVER.
+- **Config:** Precedence 10, **`MaxPasswordAge=00:00:00` (never expires)**; every other setting mirrors the Default Domain Policy (Complexity on, MinLength 7, History 24, Lockout 5 / 30 min). `AppliesTo = SG-Caregivers` -- rides the group, so it **auto-covers future hires** and a user removed from the group reverts to the 42-day domain default. **Verified:** `Get-ADUserResultantPasswordPolicy` returns `PSO-Caregivers` (`MaxPasswordAge=00:00:00`) for the cohort. Creating it did NOT change any current password.
+- **Rotation model:** on-EVENT only (offboarding, role change, suspected compromise -- e.g. the 7/1 reconcile), NOT on a schedule -- aligns with NIST 800-63B, and schedule rotation is impossible on shared phones anyway. Compensating controls: onsite-only CA (`e35614e1`), device allow-list (`1b7fd025`, pending SG-Caregivers), 8h reauth (`7d491c7a`), no MFA/personal device, auto-logoff GPO.
+- **Undo:** `Remove-ADFineGrainedPasswordPolicy PSO-Caregivers` -> caregivers revert to domain default (42-day expiry). It is `ProtectedFromAccidentalDeletion`.
+
 ### Caregiver desktop/laptop management -- Hybrid Entra Join + GPO (the chosen path)
 Because per-user **Intune** never provisioned tenant-wide (`INTUNE_A = PendingInput`; no Windows device ever Intune-enrolled -- MS case open), Windows caregiver devices are managed via **Hybrid Entra Join + on-prem Group Policy** instead. This needs no Intune. The CA access model is unchanged (hybrid join just gives the device an Entra object so the allow-list/deviceId still applies).
 - **Hybrid join proven on NURSESTATION-PC** (2026-06-05): SCP written (`ConfigureSCP.ps1`), `OU=Caregiver Devices,OU=Staff PCs,OU=Workstations` added to Entra Connect sync scope -> device synced to Entra as `trustType: ServerAd`, `dsregcmd` shows AzureAdJoined+DomainJoined YES, pilot.test gets `AzureAdPrt: YES`. On hybrid-joined machines `Ngc PreReqResult: WillNotProvision` (PolicyEnabled NO) -> **Windows Hello does not auto-provision** (no Hello popup) -- exactly what shared caregiver devices need, so no separate Hello-disable step.
@@ -164,7 +175,7 @@ Because per-user **Intune** never provisioned tenant-wide (`INTUNE_A = PendingIn
 - **Device lockdown GPO `CSC - Caregiver Device Lockdown`** (`{E6174988-2721-4D96-ADF5-F5BB44E92769}`, computer-only, linked to `OU=Caregiver Devices`) -- **DEPLOYED 2026-06-05.** Auto-logoff is a HIPAA requirement (SS164.312(a)(2)(iii)) for shared PHI devices. Settings: screen **lock at 3 min**, **auto sign-out at 15 min** total idle, **90-second warning** before sign-out, **never sleep** (display off 10 min). Delivered via a computer **startup script** (`caregiver-lockdown.ps1`, in SYSVOL) that sets `InactivityTimeoutSecs=180`, powercfg, and registers a logon-triggered scheduled task running an idle monitor in each caregiver's session. Deploy script: `deploy-device-lockdown-gpo.ps1`. **Startup scripts run at boot -- NURSESTATION must reboot** to activate (not yet verified). **Companion:** ALIS app session timeout 20->15 min (Howard, ALIS admin) **PENDING.** Lock/logoff are **device-level** (affect any user on the device in `OU=Caregiver Devices`).

 ### Status (as of 2026-07-01)
- **Caregiver phone SSO -- Entra/identity side COMPLETE for the current 35-member roster** (group + Business Premium license + forced-change AD temp passwords). Remaining gate is the ALIS Email=UPN match (Howard) + creating ALIS records for the 3 brand-new hires (Munezero, Cota, Robinson) + setting Vallejo's ALIS Email=UPN + the outstanding items from 6/30 (7 discharged-record decisions, Kariuki ALIS dup 429856/429858 dedupe if she returns).
+- **Caregiver phone SSO -- Entra/identity side COMPLETE for the current 35-member roster** (group + Business Premium license + AD temp passwords, **must-change flag cleared fleet-wide 2026-07-02** -- see the forced-change gotcha above; temp passwords now valid for phone sign-in). Remaining gate is the ALIS Email=UPN match (Howard) + creating ALIS records for the 3 brand-new hires (Munezero, Cota, Robinson) + setting Vallejo's ALIS Email=UPN + the outstanding items from 6/30 (7 discharged-record decisions, Kariuki ALIS dup 429856/429858 dedupe if she returns).
 - **Caregiver CA lockdown is LIVE (interim posture, 2026-07-01):** caretakers sign in on desktops and phones, on-network only -- see the 7/1 update above and Conditional Access / Caregiver Policies. Phones-only lockdown deferred to end of rollout.
 - **Proven working end-to-end on a hybrid-joined desktop (NURSESTATION + pilot.test):** caregiver lockdown (CA off-network block + device allow-list) **and** silent ALIS SSO. The allow-list policy `1b7fd025` carries NURSESTATION's current deviceId `d3bf931f-f128-4261-8398-b46c34a4b342` and the device is tagged `extensionAttribute1=CSCCaregiverDevice`.
 - **GPOs DEPLOYED:** `CSC - Caregiver Workstation` built and validated on pilot.test. `CSC - Caregiver Device Lockdown` deployed to `OU=Caregiver Devices` 2026-06-05. **Go-live (still gated on all devices domain-ready):** swap GPO filter `SG-Caregivers-Test` -> `SG-Caregivers`; CA allow-list test group -> `SG-Caregivers`; move real caregiver machines into `OU=Caregiver Devices` + correct `SG-PC-*` location group one at a time. **Still pending:** lower ALIS app timeout 20->15 min; reboot NURSESTATION to verify lockdown.