--- type: client name: cascades-tucson display_name: Cascades of Tucson last_compiled: 2026-06-19 compiled_by: HOWARD-HOME/claude-main sources: - session-logs/2026-03-24-session.md - session-logs/2026-03-31-session.md - session-logs/2026-04-01-session.md - session-logs/2026-04-16-session.md - session-logs/2026-04-16-howard-client-docs-import.md - session-logs/2026-04-17-session.md - session-logs/2026-04-17-howard-session.md - session-logs/2026-04-18-session.md - session-logs/2026-04-20-session.md - session-logs/2026-04-20-mac-session.md - session-logs/2026-04-21-mac-vault-setup.md - session-logs/2026-04-21-howard-remediation-vault-gap.md - session-logs/2026-04-28-session.md - session-logs/2026-04-29-session.md - session-logs/2026-04-30-session.md - session-logs/2026-05-01-session.md - session-logs/2026-05-01-howard-syncro-billing-batch-and-tmp-path-incident.md - session-logs/2026-05-10-session.md - session-logs/2026-05-18-session.md - session-logs/2026-05-18-howard-billing-review-and-ticket-updates.md - session-logs/2026-05-20-session.md - session-logs/2026-05-21-session.md - session-logs/2026-05-23-session.md - session-logs/2026-05-24-GURU-KALI-session.md - clients/cascades-tucson/session-logs/2026-05-22-session.md - session-logs/2026-05-26-howard-session.md - clients/cascades-tucson/session-logs/2026-06-02-howard-efax-scanner-ticket.md - clients/cascades-tucson/session-logs/2026-06-03-session.md - clients/cascades-tucson/session-logs/2026-06-04-howard-email-delivery-investigation.md - clients/cascades-tucson/session-logs/2026-06-04-howard-caregiver-laptop-enrollment.md - clients/cascades-tucson/session-logs/2026-06-04-session.md - clients/cascades-tucson/session-logs/2026-06-05-session.md - clients/cascades-tucson/session-logs/2026-06-05-howard-cascades-entra-ticket-billing.md - clients/cascades-tucson/session-logs/2026-06/2026-06-08-howard-edge-unc-download-bug-diagnosis.md - clients/cascades-tucson/session-logs/2026-06/2026-06-10-howard-meredith-locked-word-doc.md - clients/cascades-tucson/session-logs/2026-06/2026-06-12-howard-shared-mailboxes-grievances-surveys.md - clients/cascades-tucson/session-logs/2026-05-16-howard-wireless-diagnostic.md - clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md - clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cs-server-raid-vpn-reset.md - clients/cascades-tucson/session-logs/2026-06/2026-06-16-howard-vertical-voice-vlan-plan.md - clients/cascades-tucson/docs/network/voice-vlan-cutover.md - clients/cascades-tucson/docs/network/voice-phone-inventory.md - clients/cascades-tucson/docs/network/network-logging-plan.md - clients/cascades-tucson/session-logs/2026-06/2026-06-18-howard-voice-vlan-migration-logging-plan.md - clients/cascades-tucson/reports/2026-06-16-unifi-full-audit.md - clients/cascades-tucson/reports/2026-06-16-2.4ghz-remediation-runbook.md - clients/cascades-tucson/docs/overview.md - clients/cascades-tucson/docs/network/topology.md - clients/cascades-tucson/docs/network/vlans.md - clients/cascades-tucson/docs/servers/cs-server.md - clients/cascades-tucson/docs/billing-log.md - .claude/memory/project_cascades_admin_accounts.md - .claude/memory/project_cascades_ca_phased_rollout.md - .claude/memory/project_cascades_pilot_cleanup.md - .claude/memory/feedback_syncro_cascades_contact.md - .claude/memory/feedback_cascades_user_security_group.md - .claude/memory/project-cascades-migration-plan.md - .claude/memory/feedback_cascades_folder_redirect.md - .claude/memory/howard-home-lan-shadow.md - clients/cascades-tucson/session-logs/2026-06/2026-06-17-howard-kpi-dashboard-scoping.md - clients/cascades-tucson/docs/proposals/kpi-dashboard.md - clients/cascades-tucson/docs/proposals/kpi-dashboard-onepager.md - .claude/memory/project_cascades_kpi_dashboard.md - clients/cascades-tucson/session-logs/2026-06/2026-06-17-howard-cascades-poly-phone-drops-network-smoothing.md - clients/cascades-tucson/session-logs/2026-06/2026-06-17-howard-cascades-power-outage-recovery-and-5ghz.md - clients/cascades-tucson/session-logs/2026-06/2026-06-17-howard-cs-server-drive-review-and-spike-question.md - clients/cascades-tucson/session-logs/2026-06/2026-06-17-howard-voice-vlan30-build.md - clients/cascades-tucson/session-logs/2026-06/2026-06-18-howard-cascades-outage-followup-openvpn-printer.md - clients/cascades-tucson/session-logs/2026-06/2026-06-18-howard-synology-drive-sync-diagnosis.md - clients/cascades-tucson/session-logs/2026-06/2026-06-18-howard-lupesanchez-desktop-trcieja-perf-diag.md - clients/cascades-tucson/session-logs/2026-06/2026-06-18-howard-cascades-rf-voice-optimization-plan.md - clients/cascades-tucson/docs/network/network-optimization-master-plan.md - clients/cascades-tucson/docs/network/phase1-voice-qos-design.md - clients/cascades-tucson/reports/2026-06-18-voice-quality-diagnostic.md - clients/cascades-tucson/session-logs/2026-06/2026-06-18-howard-memcare-baseline-and-change-window.md - clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-2am-rf-run-phase2b-applied.md - clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-5ghz-attempt-and-rollback.md - clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-5ghz-dfs-datadriven-applied.md - clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-cascades-rf-night-capstone.md backlinks: - projects/gururmm - wiki/systems/uos-server --- # Cascades of Tucson Senior living / assisted living facility in Tucson, AZ. Single 6-floor building plus a MemCare (Memory Care) wing on floors 5-6. ACG took over from a previous MSP. Primary compliance driver is HIPAA. Active multi-phase migration project ongoing as of 2026-05-24. --- ## Entra Access Architecture (canonical overview) **In one line:** a HIPAA-driven, identity-based access-control system that splits staff into two security postures and enforces them with **Microsoft Entra Conditional Access** on top of **hybrid identity** (Entra Connect), with **ALIS (clinical EHR) wired for SSO**. Tickets: #109412123 (Entra setup), #110680053 (domain migration). ### Foundation -- hybrid identity - On-prem AD `cascades.local` synced to Entra/M365 via **Entra Connect** (PHS + Seamless SSO). UPN suffix `cascadestucson.com`, so a user's **Windows login = email = M365/ALIS identity** (one credential everywhere). ### Two user buckets (the core design) 1. **Restricted -- caregivers + medtechs** (group `SG-Caregivers`, `8b8d9222`): sign in **only on the Cascades network** and **only on approved devices** (shared Galaxy phones + a set of caregiver laptops/desktops). **No MFA** (no personal devices) -- protected by **location + device** controls + 8h sign-in frequency instead. Effect: caregiver credentials are **useless off-site or off an approved device** -- the anti-hacker / bad-employee-from-home control. 2. **Privileged -- admins / directors / managers / nurses** (NOT in `SG-Caregivers`): email + ALIS **from anywhere**, **seamless onsite / 2FA offsite** (Authenticator/PIN). Untouched by the caregiver lockdown. ### Conditional Access enforcement (caregivers) - `CSC - Block caregivers off Cascades network` (`e35614e1`) - `CSC - Block caregivers on non-compliant device` (`ede985e2`) -- being replaced by a **device allow-list** (`CSC - Caregivers: allow-listed devices only`, `1b7fd025`): phones (`displayName -startsWith "CSC-"`) + tagged caregiver machines (`extensionAttribute1 -eq "CSCCaregiverDevice"`, or explicit deviceId). Note: extensionAttribute changes lag >70 min into CA's filter cache -- **deviceId matching is the lag-free lever** for the small device set. - `CSC - Caregiver sign-in frequency 8h` (`7d491c7a`) - Rollout is **per-user via group membership** (test group `SG-Caregivers-DeviceTest` `db5849ec` carries the full rule set for one-at-a-time validation; promote to `SG-Caregivers` + disable compliance-block when validated). ### Devices - **Phones:** Samsung A15s in Intune **Shared Device Mode** (Android Enterprise, device-token enrolled) -- live. - **Laptops/desktops:** caregiver shared machines (Laptop2, LAPTOP-DRQ5L558, LAPTOP-E0STJJE8, ASSISTNURSE-PC, NURSESTATION-PC) joined to Entra so CA recognizes them and they go on the allow-list (group `Cascades - Caregiver Devices` `02c6f698` for policy targeting). ### ALIS SSO - Entra app registration -> OIDC SSO into ALIS; **tenant-wide admin consent granted** (2026-06-03). Per-user join key = **ALIS staff Email must equal the Entra UPN**. Caregivers SSO silently on phones (ALIS-native 2FA off); office users SSO with offsite MFA. ### Caregiver desktop/laptop management -- Hybrid Entra Join + GPO (the chosen path) Because per-user **Intune** never provisioned tenant-wide (`INTUNE_A = PendingInput`; no Windows device ever Intune-enrolled -- MS case open), Windows caregiver devices are managed via **Hybrid Entra Join + on-prem Group Policy** instead. This needs no Intune. The CA access model is unchanged (hybrid join just gives the device an Entra object so the allow-list/deviceId still applies). - **Hybrid join proven on NURSESTATION-PC** (2026-06-05): SCP written (`ConfigureSCP.ps1`), `OU=Caregiver Devices,OU=Staff PCs,OU=Workstations` added to Entra Connect sync scope -> device synced to Entra as `trustType: ServerAd`, `dsregcmd` shows AzureAdJoined+DomainJoined YES, pilot.test gets `AzureAdPrt: YES`. On hybrid-joined machines `Ngc PreReqResult: WillNotProvision` (PolicyEnabled NO) -> **Windows Hello does not auto-provision** (no Hello popup) -- exactly what shared caregiver devices need, so no separate Hello-disable step. - **Device control is one-at-a-time:** caregiver machine computer objects are moved into `OU=Caregiver Devices` (only that OU is in sync scope) and into a location group `SG-PC-MainTower` or `SG-PC-MemoryCare`. Add a device = move it into the OU + correct location group. - **App + printer delivery GPO `CSC - Caregiver Workstation`** (`{3B5CD9A6-A278-4676-A9FD-9396D21A8261}`, User-config GPP) -- **BUILT + VALIDATED on NURSESTATION as pilot.test (2026-06-05).** Linked at `OU=Caregivers,OU=Departments`; security filter = `SG-Caregivers-Test` (Apply, pilot.test only) + Authenticated Users (Read, for MS16-072). Go-live = swap filter to `SG-Caregivers`. Contents: 3 desktop shortcuts -- ALIS, LinkRx, **Helpany** (`https://app.safe-living.com/login` -- named "Helpany," the brand caregivers know) -- + 6 `\\CS-SERVER` shared printers (NursesPrinter, HealthServices, MCMedTech, MCReception, MCDirector, CopyRoom) with **default printer by device location** (Nurses for `SG-PC-MainTower`, MC MedTech for `SG-PC-MemoryCare`, computer-context ILT) + HKCU `LegacyDefaultPrinterMode=1` so the default sticks. Build scripts: `clients/cascades-tucson/scripts/build-caregiver-gpo.ps1` + `link-caregiver-gpo.ps1`. NOTE: the domain-wide `CSC - Printer Deployment` GPO is intentionally disabled (empty CSE / version 0) and is **not** to be used -- reference only. - **Device lockdown GPO `CSC - Caregiver Device Lockdown`** (`{E6174988-2721-4D96-ADF5-F5BB44E92769}`, computer-only, linked to `OU=Caregiver Devices`) -- **DEPLOYED 2026-06-05.** Auto-logoff is a HIPAA requirement (SS164.312(a)(2)(iii)) for shared PHI devices. Settings: screen **lock at 3 min**, **auto sign-out at 15 min** total idle, **90-second warning** before sign-out, **never sleep** (display off 10 min). Delivered via a computer **startup script** (`caregiver-lockdown.ps1`, in SYSVOL) that sets `InactivityTimeoutSecs=180`, powercfg, and registers a logon-triggered scheduled task running an idle monitor in each caregiver's session. Deploy script: `deploy-device-lockdown-gpo.ps1`. **Startup scripts run at boot -- NURSESTATION must reboot** to activate (not yet verified). **Companion:** ALIS app session timeout 20->15 min (Howard, ALIS admin) **PENDING.** Lock/logoff are **device-level** (affect any user on the device in `OU=Caregiver Devices`). ### Status (as of 2026-06-05) - **Proven working end-to-end on a hybrid-joined desktop (NURSESTATION + pilot.test):** caregiver lockdown (CA off-network block + device allow-list) **and** silent ALIS SSO. The allow-list policy `1b7fd025` carries NURSESTATION's current deviceId `d3bf931f-f128-4261-8398-b46c34a4b342` and the device is tagged `extensionAttribute1=CSCCaregiverDevice`. - **GPOs DEPLOYED:** `CSC - Caregiver Workstation` built and validated on pilot.test. `CSC - Caregiver Device Lockdown` deployed to `OU=Caregiver Devices` 2026-06-05 -- takes effect on next NURSESTATION reboot (verify lock@3min, 90s warning, sign-out@15min). **Monday go-live:** swap GPO filter `SG-Caregivers-Test` -> `SG-Caregivers`; CA allow-list test group -> `SG-Caregivers`; move real caregiver machines into `OU=Caregiver Devices` + correct `SG-PC-*` location group one at a time; ALIS email-match the 38 caregivers + medtechs. **Still pending:** lower ALIS app timeout 20->15 min; reboot NURSESTATION to verify lockdown. - **Independent open item:** Microsoft case for `INTUNE_A PendingInput` -- does NOT block caregiver access (hybrid+GPO path replaces the Intune dependency). --- ## Profile - **Contract type:** Prepaid hour block - **Key contacts:** - Meredith Kuhn -- Assistant Manager (ASSISTMAN-PC); internal billing contact. **NEVER set her as ticket contact in Syncro** -- she is the wrong default that keeps being selected. - John Trozzi -- Maintenance staff, Mac at 201cascades@gmail.com (shared facility account) - Lauren Hasselman -- Accounting - Zachary Nelson -- Accounting Assistant - Lois Lane -- CareTakers department head (DESKTOP-KQSL232); resistant to domain migration; John Trozzi is liaison - Crystal Rodriguez -- staff - Sharon Edwards -- Life Enrichment Assistant (DESKTOP-DLTAGOI) - Ashley Jensen -- Accountant (DESKTOP-U2DHAP0) - Shelby Trozzi -- MemCare Director (MDIRECTOR-PC) - Chris Knight -- Accounting / Business Office (same access tier as Lauren Hasselman); chris.knight@cascadestucson.com (alias: c.knight@cascadestucson.com). **Workstation setup 2026-06-08:** machine **DESKTOP-N5G1ROO** (Win 11 Pro for Workstations) domain-joined + GuruRMM-enrolled (agent `205025ee-2676-4498-8a27-e88562a6f69a`), Office installed. AD account `chris.knight` (OU=Administrative) finished to match Lauren. Mailbox remains cloud-only/unsynced (same split state as Lauren). - JD Martin -- Syncro-confirmed contact (jd.martin@cascadestucson.com); role not yet documented. - Lupe Sanchez -- staff (DESKTOP-TRCIEJA). EOL workstation (Gateway ZX6971 AIO, i3-2120, 8 GB RAM, Win11 unsupported). **Decision 2026-06-18: replace machine** (dual-AV + EOL hardware causing slow Excel; no remediation on current box). GuruRMM agent `c9bf1a2d-bfdc-401e-9cc8-f9e90bb19587` (resolve live by hostname; UUIDs change on re-enroll). - **Syncro contact emails (authoritative):** ashley.jensen@, jd.martin@, crystal.rodriguez@, John.trozzi@, meredith.kuhn@, accounting@/accountingassistant@cascadestucson.com. - **Billing rate:** $175/hr all labor (prepaid block customer) - **Hours remaining:** **55.75 hrs (live Syncro pull 2026-06-18).** Most recent draws: 0.5h remote 2026-06-10 Meredith locked Word doc (ticket #32403, 56.75->56.25); 0.5h remote 2026-06-12 shared mailboxes Grievances+Surveys (ticket #32417, 56.25->55.75). No draws from 2026-06-17/18 sessions (advisory/diagnostic only). Always live-check via `GET /customers/20149445` before billing. - **Syncro customer ID:** 20149445 - **Managed devices (Syncro):** 29 (live pull 2026-06-18) - **Active tickets:** Syncro live pull 2026-06-18 shows **0 open tickets.** See Active Work section for open non-Syncro follow-ups. #32370 (eFax/scanner onsite) was confirmed [New]/open on 2026-06-13 -- verify/likely closed. - #110680053 / #32303 -- Entra / domain migration project. Status: **Invoiced** as of 2026-06-05. Plan: `C:\Users\Howard\.claude\plans\wise-discovering-panda.md` - #109412123 -- Entra setup project (verify status) - #32403 -- Meredith locked Word doc (0.5h remote, billed 2026-06-10, Invoiced) - #32417 -- Shared mailboxes Grievances+Surveys (0.5h remote, billed 2026-06-12, Invoiced) --- ## Infrastructure ### Servers & Services | Host | IP | Role | OS | Notes | |---|---|---|---|---| | CS-SERVER | 192.168.2.254 | DC, DNS, DHCP (no scopes), File Server, Hyper-V host, Print Server | Windows Server 2019 Standard | Dell PowerEdge R610 (~2009 hardware, 16+ years old). **Single DC -- CRITICAL risk. No backup until 2026-06-15.** GuruRMM agent ID: `c39f1de7-d5b6-45ae-b132-e06977ab1713` (re-enrolled; always resolve the agent live by hostname, never hardcode the UUID). **OS RAID-1 mirror DEGRADED (2026-06-15) -- see hardware warning below.** | | CS-SERVER iDRAC | 192.168.2.65 | Out-of-band management | -- | Dell OOB interface | | CS-QB (Hyper-V VM on CS-SERVER) | 192.168.2.228 | (label "VoIP server" -- STALE) | -- | **2026-06-16 recon: SMB/445 only, no SIP response -- NOT a live SIP PBX.** Phones appear cloud-registered (Vertical). Label predates the wireless-phone transition; revisit/retire. | | cascadesDS (Synology NAS) | 192.168.0.120 | NAS / legacy file storage | DSM 7.2.1-69057 | Port 5000 HTTP. Workgroup name is "CASCADES" -- same as AD short name, causing Kerberos auth failures from domain-joined machines. Slated to become backup-only. **Synology Drive Server 3.5.0-26088** (active, port 6690 SSL). Current Drive sync: CS-SERVER Drive Client (v7.5.0.16085, runs as sysadmin) syncs Sync-user My Drive (`/volume1/homes/Sync/Drive/`) -> `D:\Shares\Main` (one-way download). Real shared folders (Server 1.9 G, Management 5.5 G, Public ~50 G, SalesDept ~23 G, etc.) are NOT in scope -- Team Folder migration pending. | | pfSense Firewall | 192.168.0.1 | Perimeter firewall, inter-VLAN routing, DHCP/DNS | pfSense Plus 25.07-RELEASE | Netgate device. cert CN=pfSense-685f277aa6886. Dual-WAN. All DHCP (CS-SERVER DHCP role has no scopes). 199 DHCP subnets (per-unit /28 VLANs, assisted-living L2 isolation). SSH shell access works (no interactive menu). Admin vault: `clients/cascades-tucson/pfsense-firewall`. OpenVPN user Howard: vault `clients/cascades-tucson/pfsense-openvpn-howard`. **Config vaulted 2026-06-17:** `clients/cascades-tucson/pfsense-config-backup-2026-06-17.sops.yaml`. pfSense is ZFS (power-loss resilient). Logs are PLAIN TEXT (not clog). | **[CRITICAL] CS-SERVER hardware -- RAID degraded (2026-06-15):** Dell R610, basic SAS 6/iR controller (3 Gbps, no cache). The **OS RAID-1 mirror (Virtual Disk2 = C:, holds OS / AD / SQL / page file) is DEGRADED** -- Physical Disk 0:0:3 (320 GB WD SATA laptop drive, `WDC WD3200BEVT`) is Critical/Removed, leaving C: on a single surviving 320 GB Hitachi `HTS545032B9A300` 5400 RPM spindle with ZERO redundancy. A 1.2 TB SAS disk (1:0:4) sits "Ready" but is the wrong size/type to rebuild the 320 GB mirror, so no auto-rebuild fired. D: is a separate healthy RAID-1 (2x 1.2 TB SAS). The degraded mirror on a slow laptop spindle is the root cause of "CS-SERVER slow" reports (random-I/O bound). With the single-DC, EOL (16+ yr) posture this is a data-loss emergency -- SSD rebuild-then-swap is a valid band-aid (image C: first; enterprise SATA SSD >= 480 GB, 2.5"; no TRIM through this controller; buy 2 identical: e.g. Solidigm D3-S4520 480 GB or Samsung PM893 480 GB; SATA negotiates to 3 Gbps; no Dell certified-drive lockout) but the DC migration remains the real fix. Gating: **verify cloud backup first full + image-based + retention before any drive work.** **[INFO] Backup -- gap closed (2026-06-15):** Mike installed ACG cloud backup (MSP360/CloudBerry -> ACG-backup server) on CS-SERVER and started a backup, addressing the longstanding SS164.308(a)(7) "no backup" HIPAA gap. (Synology Active Backup for Business remains blocked -- ext4, not Btrfs.) Verify the first full completes and set retention. **[WARNING] CS-SERVER endpoint-agent sprawl:** CS-SERVER is NOT in the ACG Bitdefender/GravityZone tenant (Cascades company id `66b0448e1e0441d02508bad8`; 3 endpoints there, CS-SERVER absent). Defender is replaced by a Syncro-managed "Endpoint Protection Service". The previous MSP's **Datto RMM/CentraStage + Datto EDR/Infocyte** are still installed on top of Syncro + GuruRMM + ScreenConnect + KPAX -- overlapping agents thrashing the degraded spindle. Clean up the Datto stack. (Infection sweep 2026-06-15: clean.) **DESKTOP-TRCIEJA is another confirmed instance** of the leftover-Datto-stack fleet-wide problem (2026-06-18) -- see Lupe Sanchez in Profile. **[WARN] Power outage (2026-06-17):** Building power outage took the entire Cascades network down (all 77 APs + 12 switches, 0 clients). Root cause chain: pfSense was plugged into the **surge-only side of the UPS** (no battery) -- it hard-powered-off uncleanly. ZFS survived (pools healthy, config.xml valid). Dirty boot caused a **duplicate dhcpd** (DISCOVER->OFFER but no REQUEST/ACK) and a **2nd-floor switch (USL24PB `Switch 2nd Floor #2`, 192.168.2.193) with one-way L2 forwarding** that blocked DHCP OFFERs from reaching floor-2 APs. Howard killed the duplicate dhcpd + clean restart remotely; Mike: re-seated pfSense onto battery outlets, restored config from on-box auto-backup (12:20 version, VLAN30 intact), reset+re-adopted Switch 2nd Floor #2 (floors 3/4 followed), rebooted Cox modem (missed post-restore step that prolonged WAN issues). Network fully restored. Post-recovery casualties: devices that booted during the DHCP-down window cached a disconnected state and did not retry (kitchen thermal printer, POS ticket printer) -- power-cycle each as reported. Incident report: `clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md`. ### Email & Identity - **M365 tenant:** cascadestucson.com | Tenant ID: `207fa277-e9d8-4eb7-ada1-1064d2221498` - **M365 license:** Business Premium (SPB) -- 34 seats enabled, 3 consumed, 31 free. Business Standard (O365_BUSINESS_PREMIUM) -- **SUSPENDED**, 31 users still assigned. Relicensing 31 users Business Standard -> Business Premium is pending and time-sensitive -- those users may have degraded service. - **On-prem AD domain:** cascades.local | UPN suffix: cascadestucson.com (added 2026-04-13 for Entra Connect SSO readiness) - **MX / mail flow:** Exchange Online (M365). SPF: `v=spf1 a mx ip4:72.194.62.5 include:spf.protection.outlook.com include:spf-0.secureserver.net -all`. DKIM: both M365 selectors published. DMARC: `p=quarantine;pct=100` -- upgraded from p=none. Reports to `info@cascadestucson.com` (unmonitored). No third-party email gateway (EOP direct MX). - **MFA:** CA policy "Require MFA for all users" is enabled. Caregiver bypass in progress -- caregivers cannot satisfy MFA (no personal device), so three scoped CA policies use BLOCK instead. See Patterns section. Voice-call MFA is **disabled tenant-wide** (SMS + Authenticator are the allowed methods). Exception: security group "MFA - Voice Call Scoped (sysadmin)" (id `304f941e-3594-4705-b8e6-ee676297df11`, single member `sysadmin@`) has Voice method enabled. - **Entra Connect:** Installed on CS-SERVER 2026-04-25. Exited staging 2026-05-14 -- actively syncing (last sync confirmed 2026-05-27). OU=Administrative not yet in sync scope; UPN suffix updates for Administrative OU users pending before that OU can be added. - **Break-glass accounts:** Two planned (`breakglass1-csc@cascadestucson.com`, `breakglass2-csc@cascadestucson.com`). Confirmed not yet created as of 2026-05-27. FIDO2 YubiKeys ordered -- arrival unconfirmed. - **Admin accounts:** - `admin@cascadestucson.com` -- Mike's working admin (cloud-only, Connect-excluded by design) - `sysadmin@cascadestucson.com` -- Howard's working admin (cloud-only, Connect-excluded by design). Object id: `471b13dc-3cf8-416b-a132-f5f3bc8d1cc8`. Vaulted at `clients/cascades-tucson/m365-sysadmin.sops.yaml`. - **ALIS (clinical SaaS):** https://cascadestucson.alisonline.com -- Entra SSO live and working. Install key: `d796539d-356b-4190-9c17-35f0f1129376`. Vault: `clients/cascades-tucson/alis-sso-app-registration.sops.yaml`. ALIS application ID `d5108493-cba8-4f08-90b6-1bb0bc09eb2a`, client secret expires 2028-05-06 (rotation reminder -- expiry breaks ALIS SSO tenant-wide). Per-caregiver: ALIS staff-record Email must match Entra UPN exactly. BAA with Medtelligent not yet verified. - **Admin consent (2026-06-03):** Tenant-wide admin consent (`AllPrincipals` `User.Read`) granted on ALIS Entra service principal (`e1cae4ad-5beb-44ca-82d4-434c9bd835ad`). This resolved `AADSTS65001` sign-in failures. CA was NOT the cause. - **How to enable ALIS SSO for one user:** (1) Tenant-wide admin consent already done globally. (2) In ALIS admin -> Staff -> user's record, set **Email = exact Entra UPN**. (3) User signs in via "Sign in with Microsoft." (4) Turn off ALIS-native 2FA (Entra is the second factor; native 2FA conflicts and locked out Karen Rossini). - **Diagnostic signature:** a user with zero ALIS-app sign-in events in Entra sign-in logs is still on the old direct-login path -- fix is the ALIS Email match, not anything in Entra. - **Caregiver phones:** 22 Samsung Galaxy A15s enrolled in Intune Shared Device Mode (SDM). Enrollment profile: `CSC - Android Shared Phones (Entra SDM)` (`9a0fcc6d`); 25 devices enrolled per 2026-06-03 Intune pull. Dynamic group: `Cascades - Shared Phones` (`ea96f4b7`). Android enrollment token expires 2027-05-08 -- expiry does NOT unenroll existing devices. - **Audit retention:** Approved 2026-04-29. Azure Log Analytics (90d) + Storage Account (6yr) in ACG subscription `e507e953-2ce9-4887-ba96-9b654f7d3267`, RG `rg-audit-cascadestucson`. **Not yet built.** - **Inky:** No Inky deployment exists in this tenant. Confirmed 2026-06-04. - **EXO MSP app auth note (2026-06-04):** When the MSP app cert is not in the Windows cert store, use client_credentials flow to obtain an EXO-scoped access token and connect via `Connect-ExchangeOnline -AccessToken`. App: ComputerGuru Exchange Operator (`b43e7342-5b4b-492f-890f-bb5a4f7f40e9`). Vault: `msp-tools/computerguru-exchange-operator.sops.yaml`. - **Shared mailboxes (created 2026-06-12):** `grievances@cascadestucson.com` and `Surveys@cascadestucson.com` -- both SharedMailbox type, cloud-only, no license consumed. Delegated to Meredith Kuhn and Ashley Jensen with FullAccess (auto-mapping) + SendAs on each. All 8 permission grants verified. Ticket #32417. ### Network - **ISP / WAN:** Dual-WAN Cox. WAN1 igc0 `184.191.143.62/30` (Cox Fiber, primary, gateway `184.191.143.61`) + WAN2 igc3 `72.211.21.217/27` (Cox Coax, secondary, static); `WAN_Group` gateway group; both active full-duplex, no loss events (verified 2026-06-16). Both WAN IPs added as Cascades Named Location in Entra (ID: `061c6b06-b980-40de-bff9-6a50a4071f6f`). **Measured bandwidth (2026-06-18):** WAN1 fiber **upload ~522 Mbps** (Cloudflare single-stream from pfSense); RRD 3-day peaks ~680 Mbps down / 98 Mbps up (actual usage). WAN2 coax upload **unmeasured** (remote source-route bind to `72.211.21.217` failed -- needs a WAN2-routed host or the Cox bill; assume asymmetric ~20-50 Mbps up). Implication: 30 calls ~= 3 Mbps vs ~522 Mbps fiber headroom -> **the WAN is NOT the everyday voice bottleneck** (RF is); voice QoS is insurance for WAN2 failover + rare WAN1 saturation. See voice QoS design. - **Firewall:** pfSense Plus **25.07-RELEASE** (Netgate) at `192.168.0.1`, cert CN=pfSense-685f277aa6886. Admin vault: `clients/cascades-tucson/pfsense-firewall`. SSH shell access works (no interactive menu). OpenVPN user Howard: vault `clients/cascades-tucson/pfsense-openvpn-howard` (split-tunnel; `route 192.168.0.0/22`; use OpenVPN GUI or OpenVPN Connect with DCO disabled for stability -- DCO/TAP instability seen 2026-06-16). pfSense-ssh.sh (unifi-wifi skill) provides scripted audit/dhcp/run access. **Logs are PLAIN TEXT on 25.07 -- read with tail/grep, NOT clog (clog returns empty).** pfSense has an **OpenVPN `--inactive` idle timeout (~300s)** configured on the server; it disconnects clients after ~5 min of no tunnel data (keepalive pings do NOT reset this counter). This is a config setting, not a fault -- raise/disable to fix the flapping (fix proposed 2026-06-18, not applied). **[OUTAGE 2026-06-17] pfSense was on UPS surge-only side -- moved to battery-backed outlets by Mike (rectified). On-box auto-backup (12:20 version) restored by Mike; config vaulted `clients/cascades-tucson/pfsense-config-backup-2026-06-17.sops.yaml`. Enable Netgate AutoConfigBackup to prevent off-box backup gap.** - **[INFO] pfSense health check (2026-06-16):** gateway ruled out as WiFi factor -- DHCP not exhausted (270/~507 active ~53% on the AP/WiFi pool), unbound DNS up, both WANs full-duplex/stable, firewall states 28-31k/790k, load 0.6. Minor: igc3/WAN2 Intel I225/226 2.5G counter quirk (1707 input-errors+collisions logged, full-duplex active, no loss) -- not a fault, no action needed. - **LAN / VLAN layout:** Primary staff/AP network `192.168.0.0/22` (pfSense .0.1, cascadesDS .0.120, UniFi APs + most WiFi clients on 192.168.2.x/3.x). DHCP pool 192.168.2.2-192.168.3.254 (~507 cap, ~270 active ~53%). Per-unit /28 VLANs: **199 DHCP subnets** total, mostly `10.x.y.0/28` per apartment (assisted-living L2 isolation) + Staff/Internal VLAN 20 (`10.0.20.0/24`, gw `10.0.20.1`) + Guest VLAN 50 (`10.0.50.0/24`, RFC1918 blocked) + **Voice VLAN 30** (`10.0.30.0/24`, gw `10.0.30.1`). DHCP backend: ISC (Kea config present, dormant). Unbound DNS. - **Switching:** Full UniFi. **77 U7-Pro APs** + **12 managed switches** (1st Floor USW-48 PoE core; floors 2-4 USW-Pro-24-PoE; MemCare USW-Pro-24-PoE; USW Lite 8 PoE; USW-16-PoE VoIP switch). **[WARN] ~25 switch ports linked at 100 Mbps but gig-capable** (systematic cabling/NIC issue, 1st/2nd/3rd-floor switches; investigate after WiFi Phase A). 3 offline switches: Switch 2nd Floor #2, Switch 4th Floor #2, USW Pro Max 16. PoE budgets healthy. Port p38 (1st Floor USW) 4.0% tx-drop rate. All managed on the shared UOS controller (172.16.3.29, HTTPS 11443; see [[uos-server]]); Cascades site short name `va6iba3v`, site_id `685f39068e65331c46ef6dd2`. **Mesh topology:** 2nd Floor Atrium is wireless-mesh parent for CC Bridge + salon (5 GHz backhaul ch36); 206 U7 Pro carries AP 108. Switch hardware replacement on floors 2/3/4 complete. **Note: Switch 2nd Floor #2 (USL24PB, 192.168.2.193) was reset+re-adopted after the 2026-06-17 power outage -- it had one-way L2 forwarding blocking DHCP offers.** - **WiFi SSIDs:** - **CSCNet -- shared PPSK SSID.** `private_preshared_keys_enabled`; ~230-242 per-key->network mappings (most keys -> per-room resident VLANs 101-631; a few -> Default; one phone key -> Internal/VLAN 20; one voice PPSK -> VOICE/VLAN 30). ~1,190 historical clients (residents' IoT/TVs, staff, phones). **Do NOT repoint the SSID to move a subset of clients** -- move at the PPSK level. wlanconf `685f39078e65331c46ef7ee5`; cred vault `clients/cascades-tucson/wifi-cscnet.sops.yaml`. - CSC ENT -- legacy SSID, main LAN (192.168.0.0/22), being deprecated as migration proceeds - Guest -- isolated, VLAN 50 - **Wireless RF status (audit 2026-06-15/16 + changes through 2026-06-17 -- ~587 concurrent clients):** - **2.4 GHz is the primary pain band:** avg TX-retry ~10%, cu_total 69-94% live, catastrophic neighbor BSSID density (ch6 ~33k BSSIDs, ch1 ~19k, ch11 ~17k). 27 of the 40 worst clients on 2.4 GHz (retry 11-42%), mostly IoT/legacy. Root cause: high radio density running at excessive TX power. - **2.4 GHz Phase A status -- OVER-THINNED (as of 2026-06-17):** Floor-4 pilot (2026-06-16) applied 14/15 radios to 6 dBm (retry 13.2->9.5%, no coverage loss). Subsequently overnight 2026-06-17, Phase A was extended: **24 of 76 2.4 radios DISABLED + 42 set to Low (~6 dBm)**; Floors 5/6 + mesh untouched. Results DEGRADED: retry 17->23.4%, satisfaction 39->30 -- over-thinned. **Current recommendation: Low->Medium for the 42 at-Low radios.** Phase 0 (ping-check off, 3AM auto-upgrade disable, pfSense logging) + Phase 1 (combined radio_table PUT: ng power medium [42 radios], na ht 40 [76], na min_rssi -82 [69]) planned, dry-run clean, NOT applied -- pending explicit go-ahead. - **5 GHz:** Auto-channel reassignment applied via UniFi 2026-06-17 (Howard) -- made co-channel overlap **WORSE** (25->30 co-channel pairs from 173 strong neighbor pairs). `dfs-check.sh` 2026-06-16: **ZERO real radar events fleet-wide** (DFS empirically low-risk). Plan: **Option B = combined per-AP PUT of 40MHz + non-DFS optimized channel plan + min-RSSI -82 relax.** Width + channel are coupled (width alone fixed only 7/25 pairs; non-DFS needs 40MHz). Dry-run clean; NOT applied. NOTE: an earlier mid-session claim (2026-06-15 audit) that "DFS was the #1 problem" was an artifact of tooling bugs and was withdrawn -- do not repeat it. - **6 GHz:** active on 75 radios; only ~1 client. **Root cause (found 2026-06-18): CSCNet is not broadcasting 6 GHz** (`wlan_bands=[2g,5g]`) -- the band is dark at the SSID level, so nothing can join it. Largest untapped, clean, non-DFS capacity. Opening 6 GHz on CSCNet (+ BSS-transition `bsstm`) is the **relief valve** that must come BEFORE narrowing 5 GHz to 40 MHz (else 5 GHz congestion just relocates). The Poly phones are 5 GHz (not 6E), so 6 GHz helps voice *indirectly* by pulling resident devices off 5 GHz. - **AP 103 saturated (5 GHz):** ch149, ~75% airtime, ~25,900 retries, 12 clients. Lauren's voice phone (`.202`) was locked here 2026-06-18 (off the CC Bridge mesh AP) -- so AP 103 MUST be relieved (off ch149 / 80->40 MHz / load-balance) or she trades a mesh problem for a congestion one. - **AP-level satisfaction 95-100 fleet-wide.** Pain is in the client tail. - **Client distribution by SSID (2026-06-18):** CSCNet 427 + CSC ENT 131 (legacy, not yet retireable) + Guest 13. - **Config flags:** 6 APs with 2.4 min-RSSI OFF (615, 608, 505, 517, 622, salon); 4 APs off the 1/6/11 plan (128 disabled, 108 offline, 108U7 Pro auto, salon auto). - **Known hardware:** AP 108 (Floor 1) offline pending a new cable run (expected). Stale duplicate controller object ("108" vs "108U7 Pro") to clean up separately. - **Creds (vault refs only):** `infrastructure/uos-server-ssh-key` (SSH/Mongo), `infrastructure/uos-server-network-api-rw` (RW controller admin), `clients/cascades-tucson/unifi-ap-ssh` (per-AP device auth via site VPN), `clients/cascades-tucson/pfsense-firewall` (pfSense admin for pfsense-ssh.sh). - **VoIP (vendor: Vertical -- Richard Turner ):** Two phone fleets -- **8 AudioCodes** (OUI `00:90:8f`, WIRED on USW-16-PoE ports 1-8, externally powered / PoE OFF) and **28 Poly** (OUI `48:25:67`, WiFi via CSCNet PPSK). As of 2026-06-18: all 8 AudioCodes + 22 Poly + the Vertical desktop are on VOICE VLAN 30 (31 devices); 6 Poly stragglers remain on VLAN 20/Default pending re-key. Phones confirmed marking **DSCP EF (46)** for voice (2026-06-18). The **Vertical-Remote management desktop** (`10.0.30.201`, MAC `e4:e7:49:52:3a:06`, WIRED USW-16-PoE port 16, VOICE VLAN 30, **DHCP** -- confirmed not static, LogMeIn remote access, no pfSense OpenVPN) is live on VLAN 30. No on-prem SIP PBX found -> phones appear to register to a **cloud/hosted PBX** (Vertical). - **[2026-06-18 CUTOVER COMPLETE] Voice VLAN (VLAN 30) consolidation:** dedicated isolated **VLAN 30 VOICE (`10.0.30.0/24`, gw `10.0.30.1`, pfSense igc1.30, DHCP `.100-.250`, DNS `8.8.8.8/1.1.1.1`)** holding ALL phones + the Vertical desktop; internet/cloud-PBX egress only, firewalled off VLAN 20 / main LAN / PHI / mgmt (HIPAA). Isolation rules verified via `pfctl -sr` (clone of GUEST VLAN -- the only actually-isolated net). Voice PPSK key on CSCNet -> VOICE: vaulted `clients/cascades-tucson/wifi-voice-ppsk`. **31 devices on VOICE as of 2026-06-18 (live inventory: `docs/network/voice-phone-inventory.md`):** - Vertical-Remote desktop (port 16): DONE -- `10.0.30.201`. Re-VLANing a wired port requires bouncing the link (port disable/enable via controller API using CSRF token); a UniFi client block/unblock is MAC-filter only, not a link bounce. - **22 of 22 migrated Poly WiFi phones: DONE** -- re-keyed to voice PPSK, on `10.0.30.202-.223`. Dial-tone + outbound calls verified. **NOTE: the Poly fleet is actually 28, not 22** -- **6 stragglers remain off VOICE** (5 on VLAN 20 `10.0.20.64/.65/.66/.67/.195`, one on `192.168.1.126`; `.20.66` Dining Room at 35% retry); re-key these to the voice PPSK so all phones are isolated + get voice QoS. - **8 AudioCodes (wired, USW-16-PoE ports 1-8): ALL DONE** -- on `10.0.30.224-.231`. **Gotcha: AudioCodes are externally powered (PoE OFF on those ports), so a UniFi PoE power-cycle AND a controller port disable/enable are both no-ops -- they held their old main-LAN DHCP leases. Required a full physical power-off/on** before they re-DHCP'd onto VOICE. - **Quality caveat:** the VLAN move gives isolation + enables QoS but does NOT by itself fix call quality -- the dropped-calls/voice-breaks complaints are an **RF problem on the WiFi (Poly) phones** (the wired AudioCodes are clean). See the Wireless / Voice QoS patterns and the 2026-06-18 voice-quality diagnostic. - **Full runbook:** `clients/cascades-tucson/docs/network/voice-vlan-cutover.md`. Live inventory: `docs/network/voice-phone-inventory.md`. Voice-quality diagnostic: `reports/2026-06-18-voice-quality-diagnostic.md`. Holistic optimization plan: `docs/network/network-optimization-master-plan.md`; voice QoS design: `docs/network/phase1-voice-qos-design.md`. ### External Vendors & Mail Senders - **bill.com (BILL):** Sends from `inform.bill.com`, `hq.bill.com`, `hello.bill.com`, `mc.bill.com`. MX via pphosted.com (Proofpoint). Confirmed delivering successfully to meredith.kuhn, ashley.jensen, lauren.hasselman, zachary.nelson as of 2026-06-04. Safe sender: `account-services@inform.bill.com`. - **BOK Financial:** Sends from `bokfinancial.com`. MX via pphosted.com (Proofpoint). DMARC p=reject. Zero emails to any cascadestucson.com user in 90-day history as of 2026-06-04 (likely wrong recipient address on BOK's side for the accounts in question). ### Business Applications & Reporting Systems Cascades' line-of-business / reporting SaaS (the systems they pull data OUT of, per Ashley Jensen 2026-06-17). Most are niche senior-living products: | System | Function | Data-out path | |---|---|---| | **ALIS** (Medtelligent) | Clinical EHR (census/clinical) | Vendor reporting/export; API TBD. **HIPAA -- BAA required before PHI leaves it.** Their most important source. SSO live (see Entra section). | | **QuickBooks** | Accounting | QBO = API + connectors; Desktop = ODBC | | **Bill.com** | AP/AR | REST API (most automatable) -- see mail-sender note above | | **Relias** | Training / LMS | Reporting export / API (completion data) | | **You've Got Leads** | Senior-living CRM | Reporting/export; API varies | | **TELS** (Direct Supply) | Facilities management | Reporting export; API uncertain | | **Focus HR** | HR / payroll | Export or vendor API (plan-dependent) | | **Helpany** (app.safe-living.com) | Caregiver app | Niche -- likely export-only | | **POS** | Point of sale | Product TBD | - **[PROPOSED] Unified KPI dashboard (Ashley Jensen request, 2026-06-17):** single dashboard pulling KPIs across the systems above. **Power BI on-prem Gateway is the WRONG frame** (it only bridges Power BI to on-prem sources, never cloud SaaS). Recommended path leans on their existing M365 Business Premium: **Phase 1** scheduled CSV/Excel exports -> SharePoint -> Power BI Pro dashboard on 3-5 KPIs (census/financials); **Phase 2** automate the API-capable systems (Bill.com, QuickBooks Online) via Power Automate. Niche senior-living apps stay on the export method (no ready connectors). Internal scoping: `clients/cascades-tucson/docs/proposals/kpi-dashboard.md`; client one-pager: `.../kpi-dashboard-onepager.md`. Status: parked, awaiting Ashley's day-one KPIs + freshness need + POS/Focus-HR specifics. Check whether ALIS offers a built-in analytics/data feed (could replace plumbing for their top source). --- ## Access - **CS-SERVER:** Via ScreenConnect or GuruRMM (live agent ID `c39f1de7-d5b6-45ae-b132-e06977ab1713` as of 2026-06-08; re-enrolls -- resolve live by hostname, do not hardcode) - **CS-SERVER iDRAC:** 192.168.2.65 - **pfSense admin (HTTPS):** https://192.168.0.1 -- vault: `clients/cascades-tucson/pfsense-firewall.sops.yaml` - **pfSense SSH:** `ssh admin@192.168.0.1` (system OpenSSH; drops to shell directly, no interactive menu) -- vault admin cred: `clients/cascades-tucson/pfsense-firewall.sops.yaml`; pfsense-ssh.sh (unifi-wifi skill) for scripted access. - **pfSense OpenVPN (Howard):** split-tunnel; vault: `clients/cascades-tucson/pfsense-openvpn-howard.sops.yaml` (user `Howard`; route 192.168.0.0/22). Use OpenVPN GUI or OpenVPN Connect with DCO disabled for stability. Note: Howard-Home is now 10.137.42.0/24 (renumbered 2026-06-16) -- Cascades 192.168.0.x now reachable over the VPN. Server has a configured `--inactive` idle timeout (~300s) that silently drops idle clients -- this is a config setting, not instability. - **pfSense config backup (2026-06-17):** `clients/cascades-tucson/pfsense-config-backup-2026-06-17.sops.yaml` - **Synology DSM:** http://192.168.0.120:5000 -- vault: `clients/cascades-tucson/synology-cascadesds.sops.yaml` (admin). Drive Server port 6690 (SSL). **[SECURITY] Synology Cloud Signin Portal credential (`clients/cascades-tucson/synology-signin-portal.sops.yaml`) was committed plaintext at vault commit 1fbc0e1 -- exposed in git history; encrypted go-forward but credential should be rotated.** - **M365 admin:** admin@cascadestucson.com -- vault: `clients/cascades-tucson/m365-admin.sops.yaml` - **M365 sysadmin:** sysadmin@cascadestucson.com -- vault: `clients/cascades-tucson/m365-sysadmin.sops.yaml` - **WiFi CSCNet:** vault: `clients/cascades-tucson/wifi-cscnet.sops.yaml` - **WiFi Voice PPSK (VLAN 30):** vault: `clients/cascades-tucson/wifi-voice-ppsk.sops.yaml` - **MDM service account:** vault: `clients/cascades-tucson/mdm-service-account.sops.yaml` - **svc-scan (scan-to-folder service account):** vault: `clients/cascades-tucson/svc-scan.sops.yaml`. AD account on CS-SERVER for the Accounting Brother's SMB scans. - **ALIS SSO app registration:** vault: `clients/cascades-tucson/alis-sso-app-registration.sops.yaml` - **UOS controller SSH (root):** vault: `infrastructure/uos-server-ssh-key` -- SSH/Mongo access for `unifi-wifi` skill and `uos-mongo.sh`. Vaulted 2026-06-15 by Mike. - **UOS controller RW admin (Network API):** vault: `infrastructure/uos-server-network-api-rw` -- required to apply any radio/config changes. Vaulted 2026-06-15 by Mike. - **UniFi AP device auth (Cascades):** vault: `clients/cascades-tucson/unifi-ap-ssh` -- direct AP SSH via site VPN (needed for `watch-ap.sh` live stream; L3 reach to 192.168.2.x/3.x via split-tunnel VPN). Vaulted 2026-06-15 by Mike. - **UOS controller (HTTPS):** https://172.16.3.29:11443 (HTTPS 11443, not 8443) -- site `va6iba3v` / site_id `685f39068e65331c46ef6dd2` - **GuruRMM -- RECEPTIONIST-PC:** agent ID `9c91d324-1073-449c-8cc0-45c5bccfc218` (flaky WebSocket, may lag fleet updates) - **GuruRMM -- ASSISTMAN-PC (Meredith Kuhn):** agent ID `cf86fa5e-96a2-494d-9cb1-8be22a518ad0` - **GuruRMM -- DESKTOP-TRCIEJA (Lupe Sanchez):** agent ID `c9bf1a2d-bfdc-401e-9cc8-f9e90bb19587` (resolve live by hostname; UUIDs change on re-enroll) - **Remediation tool:** Full tiered app suite consented 2026-04-21. All six apps active: Security Investigator, Exchange Operator, User Manager, Tenant Admin, Defender Add-on, Intune Manager. - **ComputerGuru Exchange Operator MSP app:** `b43e7342-5b4b-492f-890f-bb5a4f7f40e9` -- vault: `msp-tools/computerguru-exchange-operator.sops.yaml`. - **Vault root:** `clients/cascades-tucson/` in vault repo --- ## Patterns & Known Issues ### Syncro / Billing - **Never set a contact on any Syncro ticket unless explicitly requested.** At Cascades, Meredith Kuhn is the recurring wrong default that Syncro pre-selects -- she is not the correct contact. Leave `contact_id` blank. Source: `feedback_syncro_blank_contact.md`. - **Billing product for prepaid block draw:** Use a real labor type (Remote, Onsite, etc.) -- NOT "Prepaid project labor" (exempt, won't decrement the block). - **Always live-check hours before billing:** `GET /customers/20149445` in Syncro. Treat all cached hour counts as approximate. ### Exchange Online / Message Tracing - **Get-MessageTrace is hard-deprecated (Sept 2025).** Use `Get-MessageTraceV2` instead. Key parameter change: use `ResultSize` (not `PageSize`). The deprecation error may be silently swallowed by downstream jq filters -- if a trace returns unexpectedly empty, check the raw response for a deprecation error string before assuming no mail. Source: 2026-06-04 Chris Knight investigation. - **Sender-side suppression (SendGrid ESP):** If a user never receives mail from a specific sender despite a healthy mailbox, and message trace shows zero records (not even bounces), consider a SendGrid suppression list. Fix requires contacting the sender's support to clear the suppression -- there is no M365 action that can resolve this. Confirmed with bill.com / inform.bill.com. ### Active Directory / User Management - **Security group assignment is always explicit.** When creating or adding any Cascades user, always ask which security group(s). OU -> group auto-mirror was explicitly declined 2026-05-14. Source: `feedback_cascades_user_security_group.md`. - **New user mandatory order (folder redirection):** 1. Create AD user 2. Run `New-HomeFolder -Username ""` on CS-SERVER (creates root + Desktop/Documents/Downloads/Music/Pictures with correct ACL) 3. Add to SG-FolderRedirect 4. THEN first domain logon - Skipping step 2 causes fdeploy to cache a failure silently and never retry. Source: `feedback_cascades_folder_redirect.md`. - **Folder redirect recovery:** If fdeploy cached a failure ("No changes detected"), run `clients/cascades-tucson/scripts/fix-shell-redirect.ps1` via GuruRMM while user is logged in. Must set both GUID-based and legacy-name registry keys. Folders must already exist on server. - **fdeploy1.ini flags:** Changed from `Flags=1211` (included `Grant Exclusive Rights` bit 0x400, causing WRITE_DAC failures on new subfolders) to `Flags=187`. File at `{512B43A4-F049-4CE5-BFAC-860AD13E92BE}\User\Documents & Settings\fdeploy1.ini` on CS-SERVER. - **[ROOT CAUSE + FIX 2026-06-08] Native Folder Redirection was DOA on every machine -- the config file was MISNAMED.** Every Cascades machine had needed the manual `fix-shell-redirect.ps1` registry workaround because native FR never worked. Root cause: the redirect targets in GPO `CSC - Folder Redirection` (`{512B43A4-...}`) were saved in a file named **`fdeploy1.ini`**, but the Windows Folder Redirection client-side extension only ever reads **`fdeploy.ini`**. The file was hand-built by editing `fdeploy1.ini` (the wrong filename). **Fix:** wrote a correct `fdeploy.ini` (5 folders, `Flags=187`, `FullPath=\\CS-SERVER\Homes\%USERNAME%\`) into `{512B43A4-...}\User\Documents & Settings\`, bumped the GPO version 917506->983042 (GPT.INI **and** AD `versionNumber` kept in sync). **Native FR now redirects all 5 folders on first logon -- the registry workaround should no longer be needed for new users.** - **LE GPO also broken:** `CSC - Folder Redirection (LE)` (`{889BE7BE-...}`, linked at OU=Life Enrichment) has a **completely empty `\User` tree**. Sharon Edwards / Susan Hicks have likewise only ever worked via the registry workaround. Follow-up: retire the LE GPO and put LE users into `SG-FolderRedirect`, or apply the same `fdeploy.ini` fix to the LE GPO. Sharon/Susan are NOT currently in `SG-FolderRedirect` -- add them before relying on inheritance. - **Login-screen hide (SpecialAccounts\UserList):** An enabled local admin that does not appear in the Windows sign-in picker is a `SpecialAccounts\UserList` suppression, not a disabled account. Registry path: `HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon\SpecialAccounts\UserList`, value `=0`. Fix: delete the DWORD value; account reappears after sign-out/reboot. Confirmed on NURSESTATION-PC 2026-06-05 -- `localadmin=0` removed; account was already enabled and in Administrators. ### File Shares & Scan-to-Folder (Accounting) - **Accounting department folder + scan dropbox (built 2026-06-09):** - `D:\Shares\Accounting` on CS-SERVER -- inheritance broken; SYSTEM / BUILTIN\Administrators = Full; `lauren.hasselman`, `chris.knight`, `zachary.nelson` = Modify (no Everyone). Shared as **`\\CS-SERVER\AcctDept`** (Change: those 3 users + `svc-scan`; Full: Admins). - **Share is named `AcctDept`, NOT `Accounting`** -- a *printer* share named `Accounting` (Canon MF455DW, `LocalsplOnly`) already exists. Do not collide with it. - **`svc-scan`** = dedicated AD service account (CN=Users, PasswordNeverExpires, CannotChangePassword) for the Brother's SMB auth. Vault: `clients/cascades-tucson/svc-scan.sops.yaml`. - **REUSE `svc-scan` for EVERY future scanner->network-folder setup at Cascades** (Howard, 2026-06-09) -- do NOT create a per-printer/per-folder scan account. For a new scan destination: grant `CASCADES\svc-scan` Modify on the new scan folder, then enter `cascades\svc-scan` + the vaulted password (NTLMv2) in that scanner's Scan-to-Network profile. - **Brother MFC-L8900CDW "Business Office" printer (10.0.20.220) -- Scan-to-Network profile (working 2026-06-09):** Network Folder Path `\\192.168.2.254\AcctDept\Scans`; **Auth Method NTLMv2** (not Auto/Kerberos -- printer can't KDC across VLAN); Username `cascades\svc-scan`; PDF Multi-Page. - **[NETWORK] CS-SERVER cannot reach the VLAN-20 printers** -- main-LAN `192.168.2.x` -> VLAN 20 `10.0.20.x` is blocked at pfSense. Use a VLAN-20 PC's browser (e.g. ACCT2-PC `10.0.20.209`) or go onsite. The reverse (printer -> CS-SERVER:445) **is** open. - **Persistent drive maps to `\\cs-server\AcctDept`:** Chris (DESKTOP-N5G1ROO) Y:, Zachary (ACCT2-PC) Y:, Lauren (DESKTOP-H6QHRR7) X: (Y: was already in use on hers). ### Synology NAS (cascadesDS) / Shared File Access - **Stale Word owner (lock) files on cascadesDS shares:** Word creates a hidden `~$` owner file when a document is opened; if the user's session ends without cleanly closing Word, the `~$` file is orphaned. **Fix:** delete the `~$` file(s). Confirmed 2026-06-10: five `~$` files dated 2024 on `\\cascadesds\Public\Company Web Docs\Staff Trainings\` caused false lock messages. - **Accessing cascadesDS from RMM -- always use a user session, not CS-SERVER SYSTEM.** The domain-joined CS-SERVER machine account cannot authenticate to the Synology `Public` share because cascadesDS uses workgroup "CASCADES" (same short name as the AD domain), causing Kerberos auth failures. Workaround: run the command in the `user_session` context of a machine where the target user is actively logged in (e.g. ASSISTMAN-PC agent `cf86fa5e` for Meredith-accessible shares). - **Synology Drive sync scope (as of 2026-06-18):** The Drive Client on CS-SERVER syncs only the **Sync DSM user's My Drive** (`/volume1/homes/Sync/Drive/`) into `D:\Shares\Main` -- one-way download (mode:1). The real department shared folders (`/volume1/Server`, `/volume1/Management`, `/volume1/Public`, `/volume1/SalesDept`, etc.) are **NOT** in this scope and are NOT currently mirrored to CS-SERVER. These require a separate Team Folder setup. Note: `synopkg status SynologyDrive` falsely returns "stopped" (status 263) even when the service is active -- verify via `systemctl is-active pkgctl-SynologyDrive` and `netstat -tlnp | grep 6690` instead. ### Browser / Edge - **[BUG - FLEET] Edge 149 cannot open Office files via download-list when Downloads is a UNC-redirected folder (Chromium issue 519243472).** A regression introduced in Chromium 149 (feature `LaunchShellExecuteViaExplorer`) prepends `\\?\` to UNC paths without converting to the correct `\\?\UNC\` form, producing a malformed path. **Symptom:** clicking an `.xlsx` or `.docx` in the Edge download panel shows "Windows cannot find '\\?\\\cs-server\...'" Text files and PDFs open fine. The same Office file double-clicked from File Explorer opens normally. **Trigger:** Downloads folder redirected via GPO Folder Redirection to a UNC path with no mapped drive letter -- exactly Cascades' Homes-share redirect configuration. **Affected build:** Edge stable 149.0.4022.52. **Fix options (none applied as of 2026-06-08):** (1) Update Edge past the fix; (2) Interim: `--disable-features=LaunchShellExecuteViaExplorer`; (3) Zero-config: use "Show in folder" then double-click from Explorer; (4) Supported 149->148 rollback. Note: pinning to 148 forfeits security fixes; prefer option 1 or 3 for HIPAA machines. ### Conditional Access / Caregiver Policies - **Phased rollout -- never tenant-wide.** CA policies for caregivers now target `SG-Caregivers` (`8b8d9222-5d71-419a-936d-56d895c6c332`) (Entra Connect exited staging 2026-05-14; SG-Caregivers-Pilot superseded). The legacy "Require MFA for all users" policy stays in place. Source: `project_cascades_ca_phased_rollout.md`. - **Enforced caregiver CA policy set (unchanged as of 2026-06-03):** - `CSC - Block caregivers off Cascades network` (`e35614e1-e896-4a13-9407-076963af488f`) -- BLOCK if location not Cascades - `CSC - Block caregivers on non-compliant device` (`ede985e2-ee7e-4521-88b2-34c847c3db20`) -- BLOCK if device non-compliant. **Pending DISABLE** at allow-list cutover. - `CSC - Caregiver sign-in frequency 8h` (`7d491c7a-ad90-4420-9990-40a1e676a76c`) - **Caregiver device allow-list (2026-06-03 -- report-only):** `CSC - Caregivers: allow-listed devices only (REPORT-ONLY)` -- id `1b7fd025-1aad-47c8-9274-c32c3e0b163c`; state `enabledForReportingButNotEnforced`. Device filter (mode `exclude`): `(device.displayName -startsWith "CSC-") -or (device.extensionAttribute1 -eq "CSCCaregiverDevice")`. Includes: NURSESTATION-PC (deviceId `d3bf931f`), Laptop2, LAPTOP-DRQ5L558, LAPTOP-E0STJJE8, LAPTOP-8P7HDSEI, ASSISTNURSE-PC (needs re-join + re-tag after Win11 reinstall). - **GDAP exclusion:** CA policy 3 must exclude "Service provider users" (GDAP foreign principals) + `SG-External-Signin-Allowed` + `SG-Break-Glass`, otherwise ACG partner admins lose access at CA cutover. - **Known bug:** `Require MFA for all users` policy (`7e87a1c7...`) excludes `SG-Caregivers-Pilot` instead of the live `SG-Caregivers` (`8b8d9222`). Functionally harmless today (pilot group still exists), but must be corrected. - **Pilot cleanup required when done:** Delete `pilot.test@cascadestucson.com`, clean up `howard.enos@cascadestucson.com`, remove `SG-Caregivers-Pilot` from CA policy targets and delete the group. Source: `project_cascades_pilot_cleanup.md`. ### EXO / Message Trace - **Get-MessageTrace is deprecated.** Use `Get-MessageTraceV2` instead. V2 has a 10-day max window -- loop 9 consecutive windows to cover 90 days. - **EXO access token auth:** When `Connect-ExchangeOnline -Credential` fails and the app cert is not in the Windows cert store, use client_credentials flow to get an EXO-scoped token and pass it via `-AccessToken`. ### Wireless / UniFi RF - **[EXECUTED 2026-06-19 -- autonomous 2 AM window, validated] First production RF optimization applied + kept:** - **2.4 power Low/full -> MEDIUM on 47 radios** (the 42 over-thinned `low` floors 1-4 + named, + the 5 MemCare floors-5/6 `auto`/full radios 505/517/608/615/622). The 24 thinned-disabled radios stayed disabled; 5 mesh-auto APs untouched. Non-regressive (satisfaction held). Undid the over-thinning regression + brought MemCare off full power. **Per-AP targeting required** -- `apply-radio power --zone` re-enables disabled radios (re-confirmed gotcha). - **5 GHz -> clean DFS 40 MHz channels** on 72 non-mesh APs (channels 52/60/100/108/116/124/132/140), 0 co-channel, mesh excluded (2nd Floor Atrium + children CC Bridge/salon/108 left on auto). **Result: 5 GHz retry roughly HALVED -- 8.7 -> 3.8 avg, median 8.2 -> 2.1.** Validated; all 72 APs holding DFS, 0 radar vacates. Voice nudged back to 5 GHz (kick-sta) after the channel-change scatter. - **CSCNet BSS-transition (802.11v) ON.** 6 GHz still BLOCKED (WPA3 -- see below). - **[BIG LESSON -- non-DFS decision REVERSED by data]** A blind non-DFS reshuffle was tried first and FAILED (flat retry); the completed channel survey (74/74 APs) proved **DFS channels here are 4-5x cleaner (2-3% busy) than non-DFS (ch149=12%, ch157=28%, ch44=22% -- the property's worst).** Consumer/ neighbor gear avoids DFS. Choosing channels from the measured scan (not a non-DFS policy) is what delivered the win. **Always: scan -> `survey-report.py` -> `channel-plan --channels` -> apply -> validate.** - **Fleet (full audit 2026-06-16):** 77 U7-Pro APs, **12 switches**, ~587 wireless clients. Controller: UOS at 172.16.3.29, HTTPS 11443 (see [[uos-server]]); site short name `va6iba3v`, site_id `685f39068e65331c46ef6dd2`. No UniFi gateway (pfSense is the gateway). pfSense ruled out as WiFi factor 2026-06-16 (DHCP not exhausted, DNS up, WAN stable -- see Network section). - **Primary pain band is 2.4 GHz.** Avg TX-retry ~10%; cu_total 69-94% live; catastrophic neighbor BSSID density (ch6 ~33k BSSIDs, ch1 ~19k, ch11 ~17k). 27 of the 40 worst clients stuck on 2.4 GHz (retry 11-42%), mostly IoT/legacy hardware (Ring cameras, robotic cleaner, smart plugs, EPSON printer, Poly phone, handheld scanners, smartwatch). Root cause: ~75 2.4 GHz radios running at auto (full) TX power in extreme density. Experience splits by band: 5/6 GHz clients are fine; clients that land or stick on 2.4 GHz suffer. - **5 GHz -- DFS concern is theoretical; empirically clean.** 76/77 radios on 80 MHz width (should be 40 MHz at this density). 55/77 radios on DFS channels (52-144) near Davis-Monthan AFB + TUS airport radar. `dfs-check.sh` 2026-06-16: **ZERO real radar events fleet-wide** (55 DFS APs, full `dmesg` sweep, precise pattern match) -- DFS is empirically low-risk here. Measured TX-retry DFS (8.4%) ~= non-DFS (9.0%) -- no throughput penalty. Still recommended to move to non-DFS (UNII-1 36-48 + UNII-3 149-161) for resilience. NOTE: an earlier mid-session claim (2026-06-15 audit) that "DFS was the #1 problem" was an artifact of tooling bugs (raw counter + 15-AP head cap) and was corrected before session end -- do not repeat it. - **6 GHz is nearly unused -- root cause: CSCNet not broadcasting 6 GHz** (`wlan_bands=[2g,5g]`, found 2026-06-18). 75 radios active but only ~1 client because the band is dark at the SSID level. Largest untapped, clean, non-DFS capacity -- enabling 6 GHz on CSCNet (`apply-wlan bands all` + `bsstm on`) is the **relief valve** and must precede 5 GHz width-narrowing. The Poly voice phones are 5 GHz (not 6E), so 6 GHz helps voice indirectly by clearing 5 GHz of resident devices. - **AP 103 saturated (5 GHz):** ch149, ~75% airtime, ~25,900 retries, 12 clients. Lauren's voice phone (`.202`) locked here 2026-06-18 (off the CC Bridge mesh AP) -> AP 103 must be relieved (off ch149 / 80->40 MHz / load-balance) before/with that lock or she trades a mesh problem for congestion. - **Switch audit (2026-06-16):** ~25 ports linked at 100 Mbps but gig-capable (systematic cabling/NIC issue, 1st/2nd/3rd-floor switches; investigate after WiFi Phase A). PoE budgets healthy. 3 offline switches: Switch 2nd Floor #2, Switch 4th Floor #2, USW Pro Max 16. Port p38 (1st Floor USW) 4.0% tx-drop rate. - **AP-level satisfaction 95-100 fleet-wide.** Network is healthy on average; pain is in the client tail. - **Remediation status (as of 2026-06-17 -- OVER-THINNED):** - **Phase A (2.4 power-down): EXTENDED + OVER-THINNED.** Floor-4 pilot (2026-06-16): 14/15 radios to 6 dBm, retry 13.2->9.5%, no coverage loss. Subsequently (overnight 2026-06-17): 24 of 76 2.4 radios DISABLED + 42 set to Low (~6 dBm); Floors 5/6 + mesh untouched. Results DEGRADED: retry 17->23.4%, satisfaction 39->30. **Current action needed: Low->Medium for the 42 at-Low radios** (Phase 0 + Phase 1 per the combined radio_table PUT plan, pending go-ahead). - **Phase C (disable 9 redundant 2.4 radios): NOT applied.** Data-backed disable list (each has >=2 active-2.4 SNR neighbors): 127->128, 229->128, 248->348, 330->128, 445->347/348/247, 428->128, 622->505/615/608, Kitchen->Memcare TV room, Dining Room->memcare piano. Excludes mesh-protected APs (2nd Floor Atrium, CC Bridge, salon, 206 U7 Pro) and Memcare TV room. APs 445/428 disables held pending further validation. - **5 GHz -- auto-channel made things WORSE.** Auto-channel reassignment applied via UniFi (Howard, 2026-06-17): co-channel pairs 25->30. **Option B** (combined 40MHz + non-DFS channel plan + min-RSSI -82 relax via per-AP radio_table PUT) is the plan. Dry-run clean; NOT applied (pending go-ahead + evening window). - **Deferred levers (separate session):** min-data-rate raise (1->12 Mbps), band-steering (`apply-wlan bandsteer`), 2.4 min-RSSI on the 6 OFF APs (615, 608, 505, 517, 622, salon), 6 GHz band-steering. - **Poly phone drops (2026-06-17) -- CLOSED.** Root cause = intentional pfSense reboot on 2026-06-16 22:38:12 MST (one fleet-wide event; 28/30 phones each dropped once, all floors, including floors 5/6 untouched by radio work). Only a gateway-level event explains all-floors-at-once. Today's data (2026-06-17) back to ~99.77%. NOT a WiFi or DHCP issue. - **DHCP is healthy.** pfSense dhcpd.log: 1241 ACK / 1 NAK / 0 no-free-leases (verified via direct tail/grep -- NOT clog). Per-room /28 HIPAA segmentation is intentional (fullest 12/13); do NOT flatten. `sta_dhcp_failures` metric is client/WiFi-side (frames lost at 100% retry), not pfSense-side. - **Config flags:** 6 APs with 2.4 min-RSSI OFF (615, 608, 505, 517, 622, salon); 4 APs off the 1/6/11 plan (128 disabled, 108 offline, 108U7 Pro auto, salon auto). - **Mesh topology:** 2nd Floor Atrium is wireless-mesh parent for CC Bridge + salon (5 GHz backhaul ch36); 206 U7 Pro carries AP 108. These must NEVER be disabled or powered down via zone command -- coverage-thin auto-excludes them. - **Known hardware:** AP 108 (Floor 1) offline pending a new cable run (expected). Stale duplicate controller object ("108" vs "108U7 Pro") to clean up separately. - **AP-hang recovery:** use `device-control.sh cascades poe-cycle "" --apply` (remote PoE port cycle via controller cmd/devmgr). Do NOT use `force-provision` -- it took AP 445 offline during the Floor-4 pilot and was removed from device-control.sh. - **Tooling (`unifi-wifi` skill -- feature-complete as of 2026-06-16):** - Collectors: `audit-site.sh` (config + neighbor density), `live-stats.sh` (live per-AP/client, Plane 2), `model-rank.sh`, `radio-usage.sh` (77-day 2.4 usage history per AP; confirms POWER-DOWN vs disable), `coverage-thin.sh` (mesh-aware 2.4 SNR dominating-set -- drives Phase C), `neighbor-collect.sh` (/proc/ui_neighbor AP-to-AP SNR matrix, non-disruptive, drives optimize-radios disables), `survey-collect.sh` (per-channel busy%/noise -> channel plan), `dfs-check.sh` (precise per-AP radar event history), `switch-audit.sh`, `gw-audit.sh`, `monitor-run.sh` (cron health digest, all sites), `sites.sh` (multi-client site list, ~49 UOS sites). - **`survey-report.py` (NEW 2026-06-19) -- the channel-decision driver:** rolls the `survey-collect` JSON into the fleet per-channel/per-band-group measured busy% table + cleanest/dirtiest ranking + a suggested clean 40MHz palette. Run it BEFORE any channel change; it's what makes the DFS-vs-non-DFS call from facts (the skill previously had a non-DFS bias baked into `survey-collect`'s report AND `channel-plan`'s palette -- both fixed 2026-06-19). - Apply (gated + rollback): `apply-radio.sh` (power/width/channel/minrssi/disable/enable, --zone/--ap), `apply-wlan.sh` (minrate/bandsteer/bands/steer/bsstm/dtim/isolation/etc.), `client-control.sh` (block/unblock/kick MAC -- used to nudge sticky phones off 2.4 after a channel change), `device-control.sh` (poe-cycle; adopt/restart/locate/upgrade), **`channel-plan.sh` (now DATA-DRIVEN palette: `--channels ` or `--dfs ok|avoid|only`; default ranks ALL 40MHz primaries by measured busy%; load-balance + local-search -> 0 strong co-channel).** - pfSense: `pfsense-ssh.sh` (audit/dhcp/run -- SSH backend, no RESTAPI package needed; auth from `clients//pfsense-firewall`; system OpenSSH via askpass). ROADMAP: gated control verbs (firewall rules, port forwards) -- deferred to Mike per SS E. - All scripts site-parameterized (work for any of ~49 UOS sites). Per-client AP-side creds via `clients//unifi-ap-ssh`. - **Creds (vault refs only):** `infrastructure/uos-server-ssh-key` (SSH/Mongo), `infrastructure/uos-server-network-api-rw` (RW API), `clients/cascades-tucson/unifi-ap-ssh` (per-AP SSH, needs site VPN for L3 reach to 192.168.2.x/3.x), `clients/cascades-tucson/pfsense-firewall` (pfSense admin for pfsense-ssh.sh). - **Prior diagnostic (2026-05-16):** cloud API only, read-only; identified 2.4 GHz saturation hypothesis. Controller access was blocked at the time. Live controller access gained 2026-06-15 when Mike vaulted the SSH key and RW admin. - **Tooling note:** `live-stats.sh` accuracy bugs fixed 2026-06-15 (removed 15-AP head cap, switched satisfaction to device-level, switched TX-retries to `tx_retries_pct` rate field, sorted worst-client list by satisfaction). These bugs caused a mid-session misdiagnosis that was corrected before session end. ### VoIP / Network Device Migration - **Re-VLANing a wired switch port requires a link bounce to force re-DHCP.** Changing the native VLAN on a UniFi switch port does not reset the NIC link; the device holds its old DHCP lease (renewal unicast to the old DHCP server is blocked by the new VLAN's firewall rules). Fix: bounce the port (PoE power-cycle for PoE devices; disable/enable via controller API for non-PoE). A UniFi client block/unblock is a MAC-address filter only -- it does NOT bounce the link. Controller API port-bounce requires the `X-CSRF-Token` from the login response header (`x-updated-csrf-token`). Confirmed on the Vertical-Remote desktop (2026-06-17). - **Externally-powered devices (AudioCodes desk phones) need a PHYSICAL power-cycle, not a controller bounce.** The 8 AudioCodes sit on USW-16-PoE ports 1-8 but run on **external power bricks (PoE OFF on those ports)** -- so a UniFi PoE power-cycle is a no-op AND a controller port disable/enable did not reset their uptime either. They held their old main-LAN DHCP leases and never re-DHCP'd onto VLAN 30 until **Howard physically powered each off/on** (2026-06-18), after which all 8 pulled VOICE leases `.224-.231`. For any externally-powered wired device, plan an on-site/hands-on power-cycle for a VLAN move. - **UniFi controller PUT 403 / CSRF:** rapid controller writes can 403 -- read the CSRF token from the `x-updated-csrf-token` response header (TOKEN-cookie JWT as fallback). pfSense SSH and the controller API both rate-limit under many rapid queries; alternate between them. - **API scratch files must be written OUTSIDE the repo.** Controller-scratch (`.sta.json`, `.fleet*.dev`, etc.) written CWD-relative got swept into commits by `git add -A` and blocked rebases (a stray locked `curl.exe` held them). Use `mktemp -d` outside the repo; `.gitignore` patterns (`.fleet*`, `.ap[0-9]*`, `.vq[0-9]*`, `.q[0-9]*`) added as a backstop. ### Voice QoS (VLAN 30) -- design (2026-06-18, NOT yet built) Full design: `docs/network/phase1-voice-qos-design.md`. Status DESIGN -- nothing applied (Cascades per-change-go rule). - **The VLAN move's QoS payoff: all voice is one subnet `10.0.30.0/24`,** so QoS matches *all* voice by **source subnet** -- no per-PBX SIP/RTP port guessing. This is the cleanest match criterion and only became possible by isolating voice onto VLAN 30. Phones confirmed marking **DSCP EF (46)** (2026-06-18), so DSCP drives WMM (L2) + switch QoS (L3); subnet match is the safety net (no pfSense set-DSCP rule needed). - **QoS is INSURANCE, not the everyday fix.** Phones register to a CLOUD PBX (Vertical) over the internet, so the theoretical bottleneck is WAN-upload saturation. But measured WAN1 fiber upload **~522 Mbps** vs ~98 Mbps peak usage = huge headroom -> the WAN is not the day-to-day constraint. QoS earns its place for (1) **WAN2 (coax) failover** (small upload + a big upload = real congestion) and (2) rare WAN1 saturation (backup/large upload). The everyday dropped-calls cause is **RF** -- build QoS (cheap, correct) but set expectations. - **Layered design:** **L1 pfSense HFSC shaper** on BOTH WANs -- 3 queues `qVoice` (prio 7, realtime ~30%, source `10.0.30.0/24` via floating out rule), `qACK` (~10%), `qDefault` (default ~60%); shape to ~90-95% of actual upload to keep the queue in pfSense. **L2 UniFi WMM** maps DSCP EF -> WiFi Voice AC (protects the Poly phones over the air -- verify WMM on CSCNet). **L3 UniFi switch QoS** queues tagged voice (mostly automatic; confirm USW isn't stripping DSCP). **L4 DSCP marking** (confirmed EF on the phones). Blocker for L1 sizing: the WAN2 coax upload number (remote test failed). - **Build path:** Firewall -> Traffic Shaper -> Wizard "Multiple Lan/Wan" (prioritize by address `10.0.30.0/24`), or hand-build HFSC + floating rule. Howard drives the pfSense GUI. Rollback = disable/remove the shaper (QoS only orders packets under congestion; removing reverts to FIFO, zero residual). Skill gap: `unifi-wifi` has no QoS verb (pfSense + UniFi config task). ### Network Optimization Master Plan (all-device, 2026-06-18, NOT yet executed) Full plan: `docs/network/network-optimization-master-plan.md`. Goal: fix the *system* for all ~587 clients, not one device at a time. Floors 1-4 only this round; **Floors 5/6 (MemCare) RF + phones DEFERRED** per Howard. - **Core principle: open relief valves BEFORE constraining,** or congestion just relocates (the "whack-a-mole" trap). Sequence: **P0** baseline capture (same time-of-day) -> **P1** voice QoS (orthogonal, do first) -> **P2a** enable 6 GHz on CSCNet + BSS-transition (the offload path) + **P2b** correct the 2.4 over-thinning **Low->MEDIUM** (~12-15 dBm, not lower -- Low already starved edge clients) -> **P3** 5 GHz 80->40 MHz + **non-DFS** channel plan + relieve AP 103 -> **P4** fine-tune (2.4 1/6/11, min-RSSI, 802.11k/v roaming) -> **P5** physical cabling (separate visit). - **Interdependencies:** 6 GHz before 5 GHz-40MHz; 2.4 power Medium not Low; AP 103 must be relieved (Lauren locked there); never stack disables + power-down in one area (that caused the over-thinning); tune one lever per zone; never disable mesh-protected APs (2nd Floor Atrium, CC Bridge, salon, 206 U7 Pro, 108). - **Data-driven gate rule (Howard):** every change is a hypothesis gated on fleet-wide metrics (avg retry%, cu_total, cu_interf, satisfaction, band split, per-AP coverage holes). KEEP+proceed only if the target improved AND fleet-wide satisfaction didn't fall / retry didn't rise / no AP lost its clients; HOLD if a secondary metric regressed; ROLLBACK on any fleet regression or complaint. Validate ALL devices (CSCNet 427 + CSC ENT 131 + Guest 13), not just the 31 voice phones. Every `apply-radio`/`apply-wlan` writes a rollback JSON. ### Decisions resolved 2026-06-18 (voice/RF) - **5 GHz: USE THE CLEAN DFS CHANNELS** (REVERSED 2026-06-19 by measured data; the prior "non-DFS only" call was wrong for THIS site). The full channel survey (74/74 APs) showed DFS = 2-3% busy vs non-DFS 10-28% (ch149/157 are the worst on property). The everyday, every-call congestion on non-DFS is real and measured; the radar risk is hypothetical (0 genuine hits observed). So Howard chose the clean DFS channels (52/60/100/108/116/124/132/140) for voice quality. Safety net: UniFi auto-vacates a DFS channel on radar (regulatory -- moves ONE AP, not the fleet); FOLLOW-UP = stand up a recurring `dfs-check.sh` radar monitor. (This supersedes the 2026-06-18 "non-DFS only" decision -- which was made before the per-channel scan existed.) - **NO dedicated voice SSID** -- voice stays on the shared CSCNet PPSK. UniFi 3-SSID cap is sound RF hygiene (each SSID = beacon airtime at 77 APs); the only retirement candidate CSC ENT still has 131 active clients (staff PCs, printers, DirecTV) so a slot isn't free; and a voice SSID isn't needed (QoS is VLAN/DSCP-based and SSID-independent, band preference is best set phone-side via Vertical, roaming/power-save are phone+AP settings). Revisit only if CSC ENT's clients migrate off. ### pfSense Operations - **pfSense 25.07 logs are PLAIN TEXT, not binary clog.** Read with `tail`/`grep` directly (e.g., `tail -5000 /var/log/dhcpd.log`). Using `clog` returns empty output and will cause false conclusions. All log files confirmed ASCII text (`file /var/log/*.log`). - **pfSense OpenVPN `--inactive` idle timeout:** The Cascades OpenVPN server (`ovpns1`) has a configured `--inactive` timeout (~300s). This disconnects idle clients after ~5 min of no tunnel data. Keepalive pings do NOT reset this counter (`--inactive` measures actual tunnel data, not keepalive packets). Symptom: OpenVPN Connect auto-reconnects repeatedly; the pfSense log shows `Inactivity timeout (--inactive), exiting`. This is a config setting, not a fault. Duplicate-CN events (which would indicate a different issue) are absent. Fix: raise or disable the `--inactive` parameter on the OpenVPN server profile. Fix proposed 2026-06-18; not yet applied (requires go-ahead). - **pfSense dirty-boot / duplicate dhcpd:** After an unclean pfSense shutdown (e.g., power loss on surge-only UPS), ZFS survives but dhcpd may start twice. Symptom: DHCP DISCOVER->OFFER loop with no REQUEST/ACK completion (clients' OFFERs are handled by the wrong daemon instance). Fix: `killall dhcpd && echo "services_dhcpd_configure();" | /usr/local/sbin/pfSsh.php`; verify one instance: `pgrep -f "dhcpd -user" | wc -l` == 1. Note: `pfSsh.php` is slow (~20-40s); use timeout 60s+. - **Post-outage device stragglers:** Devices that booted during a DHCP-down window cache a disconnected state and do not retry once the network recovers. A DHCP-log scan cannot find them (they stop sending DISCOVER). Realistic plan: reactive power-cycle as reports come in. Cox modem must be rebooted after a pfSense configuration restore (otherwise WAN may not fully re-establish). ### Known Issues / Pending Hygiene (as of 2026-06-18) - **[BUG] Stale exclude-group on MFA-all-users policy:** The `Require multifactor authentication for all users` policy (`7e87a1c7...`) excludes `SG-Caregivers-Pilot` (`0674f0bc...`) instead of the live `SG-Caregivers` (`8b8d9222...`). Fix: PATCH `excludeGroups` to replace `SG-Caregivers-Pilot` with `SG-Caregivers`. - **[DESIGN] ALIS-native 2FA is not a perimeter control.** The correct permanent model: force all ALIS logins through Entra SSO (SSO-only, credential fallback disabled). Office/privileged users should be standardized onto ALIS SSO as a separate workstream; ALIS-native 2FA should then be disabled per-user then globally. - **[INFO] Android enrollment token expiry (2027-05-08) does NOT unenroll devices.** Renewal is needed only before enrolling new devices after that date. - **[WARN] ~25 switch ports at 100 Mbps but gig-capable.** Physical: re-terminate/replace cable or check NIC. Investigate after WiFi Phase A remediation is stable. - **[WARN] 3 offline switches** (Switch 2nd Floor #2 -- reset+re-adopted 2026-06-17 after power outage but may still show in some monitors, Switch 4th Floor #2, USW Pro Max 16). Root cause unknown for #2 and #3; investigate onsite. - **[SECURITY] Synology Cloud Signin Portal credential exposed in vault git history (commit 1fbc0e1).** `clients/cascades-tucson/synology-signin-portal.sops.yaml` was committed plaintext; encrypted go-forward but credential must be rotated. Verify MDM service account + WiFi CSCNet entries from the same commit were never plaintext. - **[FLEET] Leftover Datto stack (CentraStage + Infocyte/DattoAV) -- not yet cleaned up.** Confirmed on CS-SERVER (thrashing degraded disk) and DESKTOP-TRCIEJA (Lupe Sanchez, causing dual-AV slow Excel). DESKTOP-TRCIEJA will be replaced (no cleanup needed on that box). CS-SERVER cleanup still open. - **[WARN] DESKTOP-TRCIEJA duplicate computer name on network (Event 2505).** Recurring event on Lupe Sanchez's machine. Moot if machine is replaced; note for the replacement provisioning. ### Security Incidents (historical) - **Megan Hiatt (2026-04-16):** Active credential-stuffing -- 126 failed sign-ins, bursts from Belfast GB, Hamburg DE. Password reset and SMTP AUTH disable were action items. Mailbox was clean (not breached). - **John Trozzi (2026-04-16, 2026-04-20):** Investigated twice -- both times NO BREACH. First: credential stuffing flag (clean). Second: inbound phishing email (clean). Reports in `clients/cascades-tucson/reports/`. - **Crystal Rodriguez (2026-04-19):** Phishing investigation. Report: `clients/cascades-tucson/reports/2026-04-19-crystal-rodriguez-phish-investigation.md`. - **Canva email delivery (2026-05-20):** Alma Montt not receiving Canva invites. Resolved by adding canva.com domains to AllowedSenderDomains in EOP policies. - **ALIS AADSTS65001 (2026-06-03):** megan.hiatt, karen.rossini, memcarereceptionist could not sign in to ALIS on non-phone devices. Root cause: missing tenant-wide admin consent on ALIS SP (`e1cae4ad`). Resolved by granting `AllPrincipals` `User.Read` via Graph API. - **dunedolly21@gmail.com:** External guest invited 2026-04-14 by Lauren Hasselman from mobile. Status unknown -- confirm with Lauren. [unverified] - **Chris Knight bill.com / BOK email delivery (2026-06-04):** Root cause was SENDER-SIDE: bill.com address on SendGrid suppression list; BOK had wrong recipient email. Resolved externally by Howard. No tenant config changes needed. Ticket #32383, Resolved. ### HIPAA Compliance - **Primary objective.** Cascades stores PHI on CS-SERVER and uses ALIS for clinical records. - **Critical open gaps:** No audit logging on D:\Homes (SS164.312(b)); Object Access auditing disabled; no SMB encryption on homes share; no file access auditing. Audit retention infra (LAW 90d + Storage 6yr) approved but not yet built. - **Backup gap closed (2026-06-15):** Mike installed ACG cloud backup (MSP360/CloudBerry -> ACG-backup server) on CS-SERVER. Verify first full backup completes and set retention; confirm image-based / bare-metal + system-state for DC recoverability. - **Restored 7 deleted mailboxes (2026-04-25)** for HIPAA SS164.316(b)(2) 7-year retention. - **Termination policy established:** Convert to shared mailbox, hide from GAL, retain 7 years. - **Voice VLAN 30 (HIPAA-isolated):** All voice gear (phones + Vertical desktop) migrated to an isolated network with internet/cloud-PBX egress only; blocked from PHI/LAN/VLAN20/mgmt. **Cutover complete 2026-06-18: 31 devices on VOICE (8 AudioCodes + 22 Poly + desktop);** 6 Poly stragglers still on VLAN 20/Default pending re-key. --- ## Active Work Syncro live pull 2026-06-18: **0 open tickets.** No hours drawn from the 2026-06-17/18 sessions (all advisory/diagnostic/no billable infra changes). **Non-Syncro follow-ups open as of 2026-06-18:** - **[URGENT] Order replacement workstation for Lupe Sanchez (DESKTOP-TRCIEJA).** Decision made 2026-06-18. EOL Gateway ZX6971 / i3-2120 / 8 GB / Win11-unsupported. On new machine: provision GuruRMM + Bitdefender only; do NOT carry over the Datto stack. - **[URGENT] Rotate exposed Synology Cloud Signin Portal credential.** Vault commit 1fbc0e1 committed it plaintext; encrypted go-forward but credential is exposed in git history. Also verify MDM service account + WiFi CSCNet from that same commit were never plaintext. - **[DONE 2026-06-18] Voice VLAN (VLAN 30) cutover -- 31 devices on VOICE** (8 AudioCodes `.224-.231` + 22 Poly `.202-.223` + Vertical desktop `.201`). AudioCodes needed a physical power-off/on (externally powered; PoE/controller bounce was a no-op). **Remaining:** re-key the **6 Poly stragglers** still on VLAN 20/Default (`10.0.20.64/.65/.66/.67/.195`, `192.168.1.126`) to the voice PPSK. - **[PENDING - voice quality] Dropped-calls/voice-breaks are an RF problem on the WiFi (Poly) phones, not the VLAN move.** 14 phones flagged 2026-06-18; worst Lauren `.202` (was 2.4GHz/50% retry -> locked to AP 103) and Shelby `.218` (2.4GHz/53%, MemCare -- deferred). Coverage gaps rooms 515/210/204. Fixes (none applied): voice QoS (#1), force voice phones off 2.4 GHz (#2), coverage/min-RSSI (#3), migrate 6 stragglers (#4), 5 GHz width/channel (#5). Diagnostic: `reports/2026-06-18-voice-quality-diagnostic.md`. - **[PENDING - build] Voice QoS for VLAN 30** (pfSense HFSC 3-queue on both WANs matching `10.0.30.0/24` + UniFi WMM/switch QoS). Design done, not built (Howard drives pfSense GUI). Blocker for sizing: the WAN2 coax upload number. QoS is insurance (WAN has headroom); RF is the everyday fix. Design: `docs/network/phase1-voice-qos-design.md`. - **[PENDING - execute] Network optimization master plan (floors 1-4; MemCare deferred).** Sequenced P1 QoS -> P2a enable 6 GHz on CSCNet + P2b 2.4 Low->Medium -> P3 5 GHz 40 MHz + non-DFS + relieve AP 103 -> P4 fine-tune -> P5 physical. Open relief valves before constraining; per-zone, dry-run, gated on fleet metrics. Start = P2b (baseline capture + 2.4 Low->Medium). Pending Howard's go + evening window. Plan: `docs/network/network-optimization-master-plan.md`. (Supersedes the older "Wireless RF Phase 0 + Phase 1" item below -- same work, holistic framing.) - **[PENDING] Measure WAN2 (coax) upload** -- remote source-route test failed; get from a WAN2-routed host or the Cox bill (sizes the failover voice shaper). - **[PENDING] Hand Vertical (Richard Turner) the phone-side config list** -- 5 GHz band lock, DSCP-on, 802.11k/v roaming, U-APSD/power-save, firmware. - **[PLANNED] Network logging / observability (spec written, build later).** Diagnosis 2026-06-18: the UniFi controller retains **ZERO** client events/alarms for Cascades (7-day pull) and pfSense logs roll over in hours -> device drops/kicks/deauths are not captured, so the network is a black box after the fact. Plan: **Synology cascadesDS (DSM Log Center syslog server) as the on-site collector** (NOT CS-SERVER -- fragile EOL DC), with pfSense + UniFi-controller + AP syslog as sources and a 1-2 min `/stat/sta` client snapshotter to fill the controller's history gap. Optional later: Container Manager Graylog/Loki + Discord alerting. Spec: `docs/network/network-logging-plan.md`. Next: confirm Synology model/RAM/DSM. - **[PENDING] Wireless RF Phase 0 + Phase 1 (pending go-ahead + evening window):** - Phase 0 (safe anytime): pfSense ping-check off for 240 DHCP pools, disable 3 AM AP firmware auto-upgrade, enable full pfSense logging (DHCP/DNS/firewall/system/gateway) with rotation. - Phase 1 (windowed, per-zone, evening): combined per-AP radio_table PUT -- ng power medium (42 at-Low radios only, not the 24 disabled), na ht 40 (76 radios), na min_rssi -82 (69 at -77). Dry-run clean. Rollback auto-saved. Validate with watch-ap before/after. - 5 GHz Option B (same window or separate): 40MHz + non-DFS channel plan + min-RSSI -82; width + channel are coupled (width alone fixed only 7/25 co-channel pairs). Auto-channel assignment made things worse -- do NOT use UniFi auto-channel; use `channel-plan na`. - Standing rule: no Cascades prod-infra changes without discussing + explicit per-change go (memory `feedback_cascades.md` rule #4). - **[PENDING] pfSense OpenVPN `--inactive` timeout fix.** Raise/disable the `--inactive` idle timeout on the Cascades OpenVPN server profile (~300s -> raise or remove). Proposed, not applied. Needs go-ahead. - **[PENDING] Enable Netgate AutoConfigBackup** on pfSense (no off-box config backup existed before 2026-06-17 manual vault; on-box auto-backup exists but only one version). Also verify UPS covers core infra + PoE switches on battery-backed outlets (pfSense rectified; other gear not confirmed). - **[PENDING] Synology Drive Team Folder migration (department shares -> CS-SERVER).** Diagnosis complete (2026-06-18): current Drive sync covers only the Sync-user's My Drive, not the real shared folders. Plan: Admin Console Team Folder enable + low versioning; Drive Client Download-only tasks into `D:\Shares\_SynMigration\`; pilot on `/volume1/Server` (1.9 G, 2,486 files) first. Pending: confirm in-scope share list, confirm ALDocs coverage in real shares, get go-ahead to execute. Runbook (optional): `clients/cascades-tucson/docs/migration/synology-team-folder-migration.md`. - **[PENDING] Watch for post-outage device stragglers.** Devices that booted during the 2026-06-17 DHCP-down window (duplicate dhcpd) may have cached a disconnected state. Kitchen thermal printer resolved by power-cycle. Expect additional IoT/printer/POS reports; fix each by power-cycle. **Migration phase status (as of 2026-05-26):** | Machine / User | Status | |---|---| | Sharon Edwards (DESKTOP-DLTAGOI) | Domain-joined, folder redirect working via registry workaround | | Ashley Jensen (DESKTOP-U2DHAP0) | Domain-joined, folder redirect manually fixed | | Crystal Rodriguez (CRYSTAL-PC) | Domain-joined, folder redirect confirmed working 2026-05-21 | | RECEPTIONIST-PC (frontdesk) | Domain-joined 2026-05-22; loopback Replace mode, no folder redirect by design | | NURSESTATION-PC | Domain-joined, folder redirect complete | | Lauren Hasselman | Domain-joined, folder redirect complete 2026-05-23 | | Megan Hiatt (Marketing) | COMPLETE 2026-05-27 -- domain joined via ProfWiz, folder redirection live, data on server | | DESKTOP-KQSL232 (Lois Lane -- CareTakers) | Blocked -- Lois Lane resistant to change; John Trozzi working with her | | CHEF-PC, SALES4-PC, MDIRECTOR-PC | Not yet started | | DESKTOP-TRCIEJA (Lupe Sanchez) | **EOL hardware -- replace instead of migrate.** Decision 2026-06-18. | **Blocking issues / pending:** - M365 relicensing: 31 Business Standard -> Business Premium (SUSPENDED -- time-critical, 31 SPB seats free) - Break-glass accounts: not created (confirmed 2026-05-27); YubiKey arrival unconfirmed - Audit retention infra: approved 2026-04-29, not yet built - RECEPTIONIST-PC GuruRMM agent (9c91d324): flaky WebSocket, lagging fleet - Entra Connect: OU=Administrative not yet in sync scope; UPN suffix updates for that OU pending - NURSESTATION-PC: reboot required to activate `CSC - Caregiver Device Lockdown` GPO (deployed 2026-06-05; verify lock@3min, 90s warning, sign-out@15min, never-sleep) - #32370 -- eFax/scanner onsite (Howard); verify/likely closed (Syncro live 2026-06-18 shows 0 open) - Caregiver device allow-list: ASSISTNURSE-PC needs re-join + re-tag after Win11 reinstall; LAPTOP-8P7HDSEI Win11 upgrade + join/tag still pending; then cutover (enable allow-list policy, disable compliance-block) - ALIS office/privileged standardization: move office/managers/nurses to ALIS SSO-only; disable ALIS-native 2FA per-user then globally - Fix stale `SG-Caregivers-Pilot` exclude-group on `Require MFA for all users` policy - LAPTOP-8P7HDSEI: upgrade Win 10 -> Win 11 before PHI use - Edge UNC download bug (Chromium 149): decide fix path for Ashley Jensen + Lois Lane and fleet; no fix applied as of 2026-06-08 - ALIS app session timeout: lower from 20 to 15 min (Howard, ALIS admin) -- PENDING - **[CRITICAL] CS-SERVER degraded RAID-1 (2026-06-15):** OS mirror (C:) running on a single 320 GB Hitachi 5400 RPM laptop spindle, no redundancy. Recommended replacement: 2x 480 GB enterprise 2.5" SATA SSD (e.g. Solidigm D3-S4520 or Samsung PM893; no Dell drive lockout; min size 480 GB class; SAS 6/iR controller, 3 Gbps, no TRIM). Hot-swap capable. Plan rebuild-then-swap (image C: first, AFTER backup verifies; re-pull OMSA live before any physical action). DC migration is the real fix. - **[INFO] CS-SERVER cloud backup (MSP360/CloudBerry, installed 2026-06-15):** verify first full completes + confirm image-based / bare-metal + system-state + retention before any drive work. - **[CLEANUP] CS-SERVER agent sprawl:** remove the previous MSP's leftover Datto RMM (CentraStage) + Datto EDR (Infocyte) stack (thrashing the degraded disk). - **[PROPOSED] Unified KPI dashboard (Ashley Jensen):** scoped 2026-06-17; client one-pager drafted. Parked pending Ashley's day-one KPIs, data-freshness need, and POS/Focus-HR specifics. See Business Applications & Reporting Systems section. Next: deliver one-pager; confirm ALIS analytics/data-feed availability with Medtelligent. --- ## History Highlights | Date | Event | |---|---| | 2026-03-06 | ACG onboarding begins. Initial audit (CS-SERVER Dell R610, pfSense, UniFi, Synology). 19 machines. No backup, no HIPAA compliance. | | 2026-03-09 | AD security hardening: Monica Ramirez removed from Domain Admins, lockout policy fixed, AD Recycle Bin enabled, MachineAccountQuota set to 0. | | 2026-03-31 | Cascades onboarded to remediation tool. Tenant ID documented. 50 users, Secure Score 34%. | | 2026-04-13 | Major onsite: 13 stale AD accounts deleted, OU structure cleaned, UPNs migrated to cascadestucson.com, Homes share created, Folder Redirection GPO deployed (registry workaround), first domain joins. | | 2026-04-14 | Sandra Fish global admin revoked. ALIS SSO confirmed. Business Premium proposal created. | | 2026-04-16 | Breach checks: Megan Hiatt (credential stuffing, not breached; password reset). John Trozzi (clean). Crystal Rodriguez phish. /remediation-tool skill built. | | 2026-04-17 | Howard onsite: folder redirect Sharon Edwards diagnosis. John Trozzi WiFi (TP-Link + UniFi roaming instability). | | 2026-04-25 | Entra Connect installed on CS-SERVER (staging mode). 7 deleted mailboxes restored for HIPAA. Dual-WAN discovered. | | 2026-04-28-29 | CA policy reconciliation. Audit retention architecture (ACG-billed, LAW 90d + Storage 6yr). Break-glass design (2 accounts, YubiKeys). Caregiver pilot scope corrected (phased only). | | 2026-04-30 | CA rollout (Report-only mode): 3 caregiver policies created. SDM bootstrap. | | 2026-05-01 | Howard billed 33.5 hrs against prepaid block on Entra project ticket #32214 ($0 invoice). | | 2026-05-07-08 | SDM phone provisioning. SDM token success. ALIS SSO app registration values captured to vault. | | 2026-05-14-16 | Caregiver AD accounts created. Security groups always deliberate (no OU->group automation). Wireless diagnostic (read-only via cloud API; 2.4 GHz saturation hypothesis identified; local controller inaccessible at the time). | | 2026-05-18 | Billing review. 39.5 hrs remaining before session. 7 hrs billed separately. | | 2026-05-20 | Canva email delivery resolved (canva.com domains added to EOP). | | 2026-05-21 | Crystal Rodriguez folder redirect confirmed working. Lauren Hasselman + Crystal Rodriguez domain join attempted -- passwords didn't work initially. | | 2026-05-22 | Ashley Jensen domain-joined. RECEPTIONIST-PC domain-joined. GPO ILT fixes (FrontDesk printer + R: drive). cascadesDS auth failure diagnosed (workgroup collision) and deferred. | | 2026-05-14 | Entra Connect exited staging mode -- actively syncing. CA pilot re-pointed to SG-Caregivers. | | 2026-05-23 | Lauren Hasselman folder redirect complete. Megan Hiatt (Marketing) confirmed in AD, domain join pending. | | 2026-05-26 | Access control vendor meeting onsite (ticket #32324). 0.5h Howard + 0.5h Mike billed against prepaid block. Block at 28.0h. | | 2026-06-03 | ALIS AADSTS65001 diagnosed and resolved: granted tenant-wide admin consent on ALIS SP `e1cae4ad`. Caregiver device allow-list CA policy created in report-only (`CSC - Caregivers: allow-listed devices only (REPORT-ONLY)`, id `1b7fd025`). | | 2026-06-04 | Three same-day tickets: #32381 Tamra scanner (0.5h onsite), #32382 Megan file access (1.5h onsite), #32383 Chris Knight bill.com/BOK email delivery (1.5h remote). Root cause sender-side. EXO access token auth method documented. | | 2026-06-05 | NURSESTATION-PC localadmin login-screen issue (`SpecialAccounts\UserList` hide) -- removed via RMM. Vault hygiene: `sysadmin@` GA password vaulted; voice MFA scoped group created; `alternateMobile` updated to +1 520-585-1310 (Howard). Caregiver test rig built. Hybrid Entra Join enabled; NURSESTATION re-domain-joined + hybrid-registered (new deviceId `d3bf931f`). Caregiver access model proven end-to-end: pilot.test + NURSESTATION, ALIS via silent SSO. GPOs deployed: `CSC - Caregiver Workstation` validated; `CSC - Caregiver Device Lockdown` deployed to `OU=Caregiver Devices`. Ticket #32303 billed 7.0h, invoice #67782 ($0.00 prepaid). | | 2026-06-08 | **Chris Knight workstation setup (onsite).** AD account finished (OU=Administrative, home folder, SG-FolderRedirect, mail set). Machine DESKTOP-N5G1ROO domain-joined + GuruRMM-enrolled (`205025ee`), Office installed. **MAJOR: root-caused why folder redirection failed on every machine** -- FR GPO targets were in misnamed `fdeploy1.ini`; Windows reads `fdeploy.ini` (absent) -> empty path -> silent no-op. Fixed by writing correct `fdeploy.ini` to GPO `{512B43A4}` + version bump 917506->983042. Native FR now works for new users. **ASSISTNURSE-PC reinstalled (Win10->Win11).** | | 2026-06-08 | **Edge UNC download bug diagnosed (no fix applied).** Ashley Jensen + Lois Lane on Edge 149.0.4022.52 cannot open Office files from Edge download panel when Downloads is UNC-redirected. Root cause: Chromium 149 regression (issue 519243472) in `LaunchShellExecuteViaExplorer`. Fix path decision left to Howard. | | 2026-06-09 | **Accounting scan-to-folder built + billing reconciliation.** Created `D:\Shares\Accounting` + `\Scans` on CS-SERVER; shared as `\\CS-SERVER\AcctDept`; new vaulted AD service account `svc-scan`; Brother MFC-L8900CDW Scan-to-Network profile configured (NTLMv2; test scan confirmed). Found pfSense blocks main-LAN->VLAN-20. Persistent drive maps set for Chris (Y:), Zachary (Y:), Lauren (X:). Reconciled crashed-session billing; live prepay confirmed 57.75h. | | 2026-06-10 | **Meredith Kuhn locked Word doc -- stale owner files on cascadesDS.** Five orphaned `~$` files dated 2024 in `\\cascadesds\Public\Company Web Docs\Staff Trainings\` caused false lock messages. Diagnosed and deleted via RMM in Meredith's `user_session` on ASSISTMAN-PC. Ticket #32403, 0.5h remote, block 56.75->56.25. | | 2026-06-12 | **Created shared mailboxes grievances@ + Surveys@ and delegated to Meredith & Ashley.** Both SharedMailbox type (cloud-only, no license). FullAccess + SendAs granted. Work via ComputerGuru Exchange Operator cert auth (EXO module v3.10.0 installed on Howard-Home). All 8 permission grants verified. Ticket #32417, 0.5h remote, block 56.25->55.75; Invoiced. | | 2026-06-15 | **Wireless RF full audit -- controller access gained.** Mike vaulted `infrastructure/uos-server-ssh-key` + `clients/cascades-tucson/unifi-ap-ssh` + `infrastructure/uos-server-network-api-rw`. `unifi-wifi` skill used end-to-end. Live audit confirmed 77 U7-Pro APs, ~574->587 clients, 2.4 GHz saturation as primary pain band (avg retry ~10-11%, cu_total 69-94%, catastrophic neighbor density). `live-stats.sh` accuracy bugs found and fixed mid-session (15-AP head cap, wrong satisfaction/retry fields). DFS concern corrected: retry DFS 8.4% ~= non-DFS 9.0% -- no throughput penalty; mid-session misdiagnosis withdrawn. 6 GHz (1 client) identified as largest untapped capacity. Tuning plan staged; no live changes applied. | | 2026-06-15 | **CS-SERVER slowness root-caused to degraded RAID-1; backup started; pfSense OpenVPN password reset.** Dell OMSA: PD 0:0:3 (320 GB WD SATA) Critical/Removed, Virtual Disk2 (C: mirror) Degraded -> C: on a single 320 GB Hitachi 5400 RPM spindle (root cause of slowness). Mike installed MSP360/CloudBerry cloud backup on CS-SERVER (closes HIPAA backup gap). Reset Howard's lost pfSense OpenVPN password via Diagnostics PHP-exec from CS-SERVER (local_user_set_password() -> AUTHOK); vaulted at `clients/cascades-tucson/pfsense-openvpn-howard`. | | 2026-06-16 | **Voice VLAN plan for Vertical phones (PLANNED, not executed).** Diagnosed split voice gear: Poly phones (22, WiFi/CSCNet/VLAN 20), AudioCodes (8, wired USW-16-PoE/Default LAN), Vertical desktop (wired, static, no ACG login). CSCNet confirmed as shared PPSK SSID (not simple staff/VLAN-20). GuruRMM recon: desktop RDP-only (not a PBX); CS-QB SMB-only/no SIP; phones likely cloud PBX. Designed VLAN 30 VOICE (10.0.30.0/24, isolated, internet-only egress); wrote cutover runbook (`docs/network/voice-vlan-cutover.md`); vendor email sent. Awaiting Richard's confirm + window. | | 2026-06-16 | **pfSense confirmed as pfSense Plus 25.07-RELEASE; health verified; home-LAN shadow resolved.** Howard-Home renumbered from 192.168.0.0/24 to 10.137.42.0/24 (removed collision with Cascades 192.168.0.0/24). pfSense now reachable from Howard-Home over the site VPN. SSH health check: DHCP not exhausted, DNS up, WAN stable, states 28-31k/790k, load 0.6 -- gateway ruled out as WiFi factor. `pfsense-ssh.sh` backend built and validated live (SSH, no RESTAPI package needed). | | 2026-06-16 | **Floor-4 2.4 GHz power-down pilot applied (first production RF change).** 14/15 Floor-4 radios set to 6 dBm (from ~23); avg retry 13.2->9.5% (~28% fewer retransmits); clients retained, no coverage loss. AP 445 lagged (left alone, harmless). AP-hang recovery procedure learned: `device-control poe-cycle` (NOT force-provision -- took 445 offline; removed from the tool). `dfs-check.sh` confirmed ZERO real radar events fleet-wide (DFS empirically clean). `unifi-wifi` skill feature-complete (WiFi monitor/tune/apply + switch/gateway/pfSense-SSH + multi-client + channel-plan + cron health). | | 2026-06-17 | **KPI dashboard scoping for Ashley Jensen (advisory; no infra touched).** Reframed her Power BI Gateway question (gateway is on-prem-only, not a SaaS connector). Catalogued the 9 reporting systems (ALIS/QuickBooks/Bill.com/Relias/You've Got Leads/TELS/Focus HR/Helpany/POS). Recommended Phase 1 (exports->SharePoint->Power BI Pro) -> Phase 2 (Power Automate for Bill.com/QBO), leveraging existing M365 Business Premium. Wrote internal scoping note + client-facing one-pager (with cost line) under `docs/proposals/`. Parked pending Ashley's KPIs + freshness + POS/Focus-HR specifics. | | 2026-06-17 | **Voice VLAN 30 built + verified; Vertical desktop + initial Poly phones migrated.** Richard Turner confirmed window; VLAN 30 pfSense interface (igc1.30, 10.0.30.0/24) + isolation rules built (clone of GUEST VLAN, Protocol=Any; verified via `pfctl -sr`). UniFi VOICE network + CSCNet voice PPSK created (vaulted). Vertical desktop migrated (port-16 bounce via controller API with CSRF token; re-DHCP'd to 10.0.30.201). Key learnings: desktop is DHCP (not static), Vertical uses LogMeIn (not pfSense OpenVPN), re-VLAN wired port requires link bounce. | | 2026-06-17 | **Poly phone drops root-caused and closed; whole-network smoothing plan built (dry-run only).** Phone drops = intentional pfSense reboot on 2026-06-16 22:38 MST (transient, one-time, resolved). DHCP server healthy (1241 ACK/1 NAK/0 no-free-leases, read directly from plain-text log). Per-room /28 HIPAA segmentation confirmed intentional + healthy. Produced prioritized smoothing plan (Phase 0 + Phase 1 radio_table PUT). Nothing applied. Hard rule established: no Cascades prod-infra changes without discussing + explicit per-change go. | | 2026-06-17 | **CS-SERVER drive review (advisory).** Confirmed surviving drive: Hitachi HTS545032B9A300 (0:0:2, 320 GB SATA, 5400 RPM). Recommended replacement drives: 2x 480 GB enterprise SATA SSD (e.g. Solidigm D3-S4520 or Samsung PM893), hot-swap on R610 backplane. No server-side commands run; gated on backup verification. | | 2026-06-17 | **Power outage -- full site down + recovery.** All 77 APs + 12 switches disconnected. Root cause: pfSense on UPS surge-only side (no battery) -> unclean shutdown -> ZFS OK, but duplicate dhcpd + 2nd-floor switch one-way L2 forwarding. Howard: killed duplicate dhcpd + clean restart remotely. Mike: moved pfSense to battery outlets, restored config from on-box auto-backup (VLAN30 intact), reset+re-adopted Switch 2nd Floor #2 (USL24PB), rebooted Cox modem. Network fully restored. Separately: 5GHz auto-channel applied (co-channel 25->30, worse). pfSense config vaulted. Pre-existing plaintext Synology signin credential found in vault history (commit 1fbc0e1) -- encrypted go-forward; needs rotation. Incident report: `clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md`. | | 2026-06-18 | **Power outage follow-ups: OpenVPN flapping root-caused; kitchen printer casualty resolved.** OpenVPN disconnect/reconnect cycle = configured `--inactive` idle timeout (~300s) on the pfSense server, not a fault. Fix proposed (raise/disable); not applied. Kitchen thermal printer (iPad POS) would not print post-outage -- booted during DHCP-down window, cached disconnected state; fixed by power-cycle. DHCP straggler sweep: 13/13 active senders completing, 0 stuck. | | 2026-06-18 | **Synology Drive sync architecture diagnosed; Team Folder migration plan produced.** Current Drive sync is Sync-user My Drive only (not the real shared folders). Real NAS shares (Server 1.9 G, Management 5.5 G, Public ~50 G, SalesDept ~23 G) are not mirrored. Plan: Team Folder Download-only tasks into `D:\Shares\_SynMigration\` staging; pilot on `/volume1/Server`. No changes made. | | 2026-06-18 | **DESKTOP-TRCIEJA (Lupe Sanchez) performance diagnosed; replace-not-remediate decision.** Root causes: (a) EOL hardware -- Gateway ZX6971 AIO, Intel i3-2120 (2011, 2C/4T), 8 GB RAM, Win11 unsupported; (b) dual real-time AV -- ACG Bitdefender (keep) + leftover Datto stack (Datto RMM/CentraStage + Datto EDR/Infocyte + bundled DattoAV) both scanning every file on a 2-core CPU under memory pressure. OneDrive ruled out (desktop is local). Howard decided: no remediation; order replacement. Another instance of the fleet-wide leftover-Datto-stack cleanup. | | 2026-06-18 | **Voice VLAN 30: all 22 Poly phones migrated; network-logging spec written.** Completed the Poly cutover live -- all 22 WiFi phones re-keyed to the voice PPSK onto `10.0.30.202-.223` (per-phone location inventory in `docs/network/voice-phone-inventory.md`); first phone (Lauren Hasselman) dial-tone + outbound call verified. Vertical desktop fixed via port-16 bounce (controller API + CSRF) -> `10.0.30.201`. AudioCodes (8, wired) still pending (flip + PoE power-cycle). Separately, found the UniFi controller retains **ZERO** client events for Cascades (drop/kick history not captured) -> wrote a network-logging spec (`docs/network/network-logging-plan.md`): Synology Log Center on-site collector, pfSense+UniFi syslog sources, client snapshotter. Plan only -- build later. | | 2026-06-18 | **Voice VLAN 30 cutover COMPLETE (8 AudioCodes added); voice-quality diagnosed; holistic all-device optimization master plan built.** AudioCodes finished -- they wouldn't re-DHCP via PoE/controller bounce (externally powered, PoE off); Howard physically power-cycled all 8 -> VOICE leases `.224-.231` (31 devices total on VLAN 30). Diagnosed the dropped-calls complaints: **the VLAN move does NOT fix call quality -- it's RF on the Poly WiFi phones** (wired AudioCodes clean). 14 Poly flagged; worst Lauren `.202` (2.4GHz/50% retry -> locked to AP 103) + Shelby `.218` (2.4GHz/53%, MemCare/deferred); coverage gaps rooms 515/210/204; found 6 unmigrated Poly stragglers (fleet is 28, not 22). Built `network-optimization-master-plan.md` (open-relief-valves-before-constraining sequence: QoS -> 6 GHz on CSCNet + 2.4 Low->Medium -> 5 GHz 40 MHz/non-DFS/relieve AP 103 -> fine-tune -> physical) with interdependency map + data-driven gate framework, floors 1-4 only. Designed Phase 1 voice QoS (`phase1-voice-qos-design.md`: pfSense HFSC + UniFi WMM, match `10.0.30.0/24`, phones mark DSCP EF; measured WAN1 up ~522 Mbps -> QoS is insurance, RF is the substance). Rigorous DFS re-verification (0 genuine radar/~1-day window) -> **decision: NON-DFS only**. **Decision: no dedicated voice SSID** (3-SSID cap, CSC ENT still 131 clients, QoS is SSID-independent). 6 GHz root-caused dark: CSCNet not broadcasting 6g. NO live network changes applied (per-change-go rule). | | 2026-06-19 | **FIRST PRODUCTION RF OPTIMIZATION applied (autonomous 2 AM window) -- 2.4 power fix + data-driven 5 GHz DFS plan; 5 GHz retry HALVED.** Howard pre-authorized an autonomous 2 AM run. Applied + validated + KEPT: (1) **2.4 power Low/full -> MEDIUM on 47 radios** (over-thinning fix floors 1-4 + MemCare 5/6 off full power; 24 disabled stayed disabled; per-AP targeting since `--zone` re-enables disabled), non-regressive. (2) **CSCNet BSS-transition ON.** 6 GHz attempted but **BLOCKED -- `Wpa3MandatoryFor6GHzBand`** (CSCNet is WPA2/PPSK; converting the 427-client SSID is a supervised decision, deferred to Howard). A first blind non-DFS 5 GHz reshuffle (3a/3b) was tried, did NOT validate (flat retry, voice scattered to 2.4), and was ROLLED BACK. **Howard's correction: scan FIRST, decide from data.** Completed the full channel survey (74/74) -> proved **DFS channels here are 4-5x cleaner (2-3% busy) than non-DFS (ch149=12%, ch157=28%)**; the non-DFS-only decision was reversed. Built a **data-driven clean-DFS plan** (8 clean DFS 40MHz channels, per-AP cleanest + neighbor graph-color + local-search -> 0 co-channel), applied to 72 non-mesh APs (mesh excluded), nudged voice back to 5 GHz. **Result: 5 GHz retry 8.7 -> 3.8 avg (median 8.2 -> 2.1), satisfaction median 99, voice 31/31, all 72 APs holding DFS, 0 radar vacates.** Also disabled the 3 AM AP auto-upgrade (left OFF). **Skill hardened:** added `survey-report.py` (fleet channel-congestion analysis) + made `channel-plan.sh` palette data-driven (`--channels`/`--dfs`, load-balance + local-search) -- killed the non-DFS bias that caused the first failed attempt. | --- ## Compilation Notes **Session logs read:** all prior sessions + 2026-06-17/18 logs: voice VLAN 30 build + Poly cutover, Poly phone-drop root cause + wireless smoothing plan, power-outage recovery + 5GHz option analysis, CS-SERVER drive review, KPI dashboard scoping, power-outage follow-up (OpenVPN + printer), Synology Drive sync diagnosis, DESKTOP-TRCIEJA (Lupe Sanchez) perf diagnosis. Date range: 2026-03-06 through 2026-06-18. **Full recompile addendum (2026-06-18 -- recovered-docs fold-in):** folded in the 4 docs restored after the repo-rewrite (master plan, voice QoS design, voice-quality diagnostic, RF/voice-optimization session log). Key corrections + additions: **Voice VLAN 30 cutover is COMPLETE** (8 AudioCodes `.224-.231` added -- prior compile had them 0/8 pending); AudioCodes physical-power-cycle gotcha; Poly fleet is 28 (6 stragglers off VOICE); voice quality is an RF problem (per-phone diagnostic); 6 GHz dark because CSCNet isn't broadcasting 6g; AP 103 5 GHz saturation; measured WAN1 upload ~522 Mbps (QoS = insurance); new Patterns subsections (Voice QoS design, Network Optimization Master Plan, Decisions-resolved-2026-06-18: non-DFS-only + no-voice-SSID); Active Work + History + HIPAA reconciled to the complete cutover. **Prior compile (2026-06-18, refresh + initial):** - Voice VLAN 30: status updated to 22/22 Poly + desktop DONE; AudioCodes were 0/8 pending at that point (now complete -- see addendum). PPSK vaulted. Wired-port link-bounce pattern documented. - Power outage (2026-06-17): full incident documented. pfSense UPS placement rectified. Duplicate dhcpd, 2nd-floor switch L2 failure, Cox modem reboot step. Post-outage straggler pattern (power-cycle) documented. pfSense config vaulted. Synology signin credential exposure flagged (vault commit 1fbc0e1). - Wireless: Phase A extended overnight 2026-06-17 and over-thinned (retry 17->23.4%, satisfaction 39->30). 5GHz auto-channel made co-channel overlap worse. Both corrective plans staged (Low->Medium, Option B) but not applied. Phone drop mystery closed (intentional pfSense reboot). - DESKTOP-TRCIEJA (Lupe Sanchez): added to key contacts and migration table; EOL hardware + dual-AV root cause; replace decision. - pfSense patterns: plain-text logs (not clog), --inactive OpenVPN timeout, dirty-boot/duplicate-dhcpd recovery, post-outage stragglers. - Synology Drive sync architecture documented (current scope is Sync-user My Drive only; Team Folder migration plan for department shares). - Active Work: updated with all new non-Syncro follow-ups (Lupe replacement, Synology credential rotation, AudioCodes cutover, Phase 0+1 wireless, OpenVPN fix, AutoConfigBackup, Team Folder migration, outage stragglers). - Sources: 7 new session-log paths appended (2026-06-17-poly-phone-drops, 2026-06-17-power-outage, 2026-06-17-cs-server-drive-review, 2026-06-17-voice-vlan30-build, 2026-06-18-outage-followup-openvpn-printer, 2026-06-18-synology-drive-sync, 2026-06-18-lupesanchez-perf-diag); voice-phone-inventory.md added. - Billing: hours 55.75 (unchanged; no draws from 2026-06-17/18 sessions); date updated to 2026-06-18. **Client folder:** `clients/cascades-tucson/` (NOT `clients/cascades/` -- that directory does not exist). **Open items flagged as unverified:** - Break-glass accounts + YubiKeys -- confirmed not created as of 2026-05-27; YubiKey arrival unconfirmed - Audit retention infra -- approved 2026-04-29, not yet built - dunedolly21@gmail.com guest invite -- confirm with Lauren - Windows MDM auto-enroll scope -- confirm in portal (Entra -> Devices -> Mobility -> Microsoft Intune -> MDM user scope) - #32370 -- verify/likely closed; Syncro live 2026-06-18 shows 0 open tickets - Edge UNC download bug fix path -- no fix applied as of 2026-06-08; decision pending Howard - ALIS BAA with Medtelligent -- not yet verified; confirm with Meredith (also: does ALIS offer a built-in analytics / data feed? relevant to the KPI dashboard) - KPI dashboard (Ashley Jensen) -- parked; need day-one KPIs, data-freshness need, POS product + Focus HR plan before scoping a build - JD Martin (jd.martin@cascadestucson.com) -- confirmed Syncro contact; role not yet documented - CS-SERVER cloud backup: verify first full completes, confirm image-based / bare-metal + system-state, set retention; only then proceed with RAID remediation - NURSESTATION-PC: verify `CSC - Caregiver Device Lockdown` GPO activated (requires reboot; verify lock@3min, 90s warning, sign-out@15min, never-sleep) - Wireless RF: Phase 0 + Phase 1 (Low->Medium + Option B) pending scope go-ahead from Howard; windowed evening session needed - Christine (room 515, Poly phone 10.0.30.220) -- last name noted as "~Nyuda -- VERIFY" - UPS coverage: confirm all core infra + PoE switches are battery-backed (pfSense rectified; others not confirmed) - Netgate AutoConfigBackup: not yet enabled - Synology signin credential rotation: exposed in vault history commit 1fbc0e1; encrypted go-forward but must rotate **Resolved since last compile (2026-06-17 -> 2026-06-18):** - Poly phone drops: closed (intentional 2026-06-16 pfSense reboot; transient) - Voice VLAN 30: cutover COMPLETE -- 8 AudioCodes (`.224-.231`) + 22 Poly + Vertical desktop = 31 devices on VOICE (6 Poly stragglers remain off VOICE) - Voice-quality root cause: identified as RF on the WiFi Poly phones (not the VLAN move); per-phone diagnostic produced - 6 GHz dark: root-caused (CSCNet `wlan_bands=[2g,5g]` -- not broadcasting 6g) - 5 GHz DFS question: RESOLVED -- non-DFS only (resilience near Davis-Monthan/TUS) - Dedicated voice SSID question: RESOLVED -- no (shared CSCNet; QoS is SSID-independent) - pfSense 25.07 log format: documented (plain text, not clog) - pfSense config backup: vaulted post-restore (2026-06-17) - pfSense on battery-backed UPS: rectified (Mike, 2026-06-17) - Kitchen printer / post-outage straggler: resolved by power-cycle (2026-06-18) - Synology Drive sync architecture: diagnosed (2026-06-18; Team Folder plan produced) - DESKTOP-TRCIEJA root cause: identified (dual AV + EOL hardware); decision made (replace) ## Backlinks - [[projects/gururmm]] -- RECEPTIONIST-PC enrolled (site CascadesTucson); CS-SERVER enrolled - [[wiki/systems/uos-server]] -- shared UOS controller hosts the Cascades UniFi site (site_id `685f39068e65331c46ef6dd2`); SSH/Mongo access via `infrastructure/uos-server-ssh-key`