diff --git a/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md b/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md index 7268a27..1728ac8 100644 --- a/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md +++ b/clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md @@ -685,3 +685,39 @@ Scope for tonight: recommended = power-down-all + pilot ONE floor's disables, va ### Note: mesh data is from .claude/tmp/cascades-nbr.json (fresh tonight, gitignored) — regenerate via neighbor-collect before execution if stale. + +## Update: 19:28 PT (2026-06-16) — home LAN renumber fixes VPN shadow; pfSense ruled out as WiFi cause; SSH backend + +### Home-LAN shadow RESOLVED (unblocked Cascades pfSense access) +Diagnosed why the Cascades VPN couldn't reach 192.168.0.x: Howard-Home was 192.168.0.0/24 with a UniFi +gateway at 192.168.0.1 (cert unifi.local — NOT pfSense), colliding with Cascades' own 192.168.0.0/24 +(pfSense .0.1). OS prefers the directly-connected local /24, so 192.168.0.x never crossed the tunnel +(APs on 192.168.2.x/3.x worked). A /32 route can't fix it (.0.1 was Howard's own gateway). **Howard +renumbered Howard-Home to 10.137.42.0/24** (gw 10.137.42.1). Verified: ping 192.168.0.1 now replies over +the tunnel (TTL 64, ~15ms); cert = CN=pfSense-685f277aa6886 / "Netgate pfSense Plus" (the REAL Cascades +pfSense); NAS .120 reachable; controller-side (172.16.3.29) + AP reach (192.168.2.79) intact. Memory: +[[howard-home-lan-shadow]] updated to RESOLVED. + +### pfSense investigated — NOT a WiFi factor (gateway ruled out) +SSH into Cascades pfSense (admin, shell works directly — no menu gotcha; pfSense Plus 25.07-RELEASE). +Findings: DHCP NOT exhausted (0 "no free leases"; WiFi/AP pool 192.168.0.0/22 range .2.2-.3.254 cap ~507 +at 270 active ~53%; 199 subnets total, mostly per-unit /28s; active backend ISC, Kea dormant); unbound DNS +up; dual-WAN (WAN1 184.191.143.62/30, WAN2 72.211.21.217/27) both full-duplex, no gateway loss/down events; +PF states 28-31k/790k; load 0.6; 10-day uptime. Minor: igc3/WAN2 1707 Ierrs+Coll = Intel I225/226 2.5G +counter quirk (2.5G full-duplex active, no loss), not a fault. CONCLUSION: gateway/DHCP/DNS/WAN are not +bottlenecking WiFi — the 2.4 GHz RF remediation is the sole fix. Added "pfSense health check (2026-06-16)" +section to clients/cascades-tucson/reports/2026-06-16-unifi-full-audit.md. + +### pfSense compat layer: pivot to SSH backend, OFF HOLD (Mike's call) +Mike (relayed by Howard): "we don't need the RESTAPI — VPN + SSH gets the same data and makes changes." +The earlier "pfSense too old for RESTAPI" premise was WRONG (it's 25.07, current). So the upgrade/package +blocker is moot; layer off hold. Built scripts/pfsense-ssh.sh (audit | dhcp | run ""); cred from +clients//pfsense-firewall; system OpenSSH via askpass; validated live on Cascades (the health data +above came from it). ROADMAP §E + SKILL.md updated to the SSH decision; REST pfsense-backend.sh left +dormant/optional. Closed obsolete coord todo afbe6a22 (upgrade-for-RESTAPI). Remaining = named gated +CONTROL verbs over SSH (easyrule block-ips, pf/fw toggles) — Howard: "leave that up to Mike" (his §E). +Commits: 58ecc5a (pfSense-ssh + health + off-hold), e42ad8f (memory). + +### Still open (the actual WiFi win) +Tonight's 2.4 remediation per the runbook (reports/2026-06-16-2.4ghz-remediation-runbook.md): scope +(power-down-all + pilot one floor vs all 25) + whether Claude runs validation gates live during execution. diff --git a/wiki/clients/cascades-tucson.md b/wiki/clients/cascades-tucson.md index 20ff0b1..6ceb124 100644 --- a/wiki/clients/cascades-tucson.md +++ b/wiki/clients/cascades-tucson.md @@ -2,7 +2,7 @@ type: client name: cascades-tucson display_name: Cascades of Tucson -last_compiled: 2026-06-15 +last_compiled: 2026-06-16 compiled_by: HOWARD-HOME/claude-main sources: - session-logs/2026-03-24-session.md @@ -43,6 +43,11 @@ sources: - clients/cascades-tucson/session-logs/2026-06/2026-06-12-howard-shared-mailboxes-grievances-surveys.md - clients/cascades-tucson/session-logs/2026-05-16-howard-wireless-diagnostic.md - clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cascades-wifi-rf-audit.md + - clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cs-server-raid-vpn-reset.md + - clients/cascades-tucson/session-logs/2026-06/2026-06-16-howard-vertical-voice-vlan-plan.md + - clients/cascades-tucson/docs/network/voice-vlan-cutover.md + - clients/cascades-tucson/reports/2026-06-16-unifi-full-audit.md + - clients/cascades-tucson/reports/2026-06-16-2.4ghz-remediation-runbook.md - clients/cascades-tucson/docs/overview.md - clients/cascades-tucson/docs/network/topology.md - clients/cascades-tucson/docs/network/vlans.md @@ -142,15 +147,17 @@ Because per-user **Intune** never provisioned tenant-wide (`INTUNE_A = PendingIn | Host | IP | Role | OS | Notes | |---|---|---|---|---| -| CS-SERVER | 192.168.2.254 | DC, DNS, DHCP (no scopes), File Server, Hyper-V host, Print Server | Windows Server 2019 Standard | Dell PowerEdge R610 (~2009 hardware, 16+ years old). **Single DC — CRITICAL risk. No backup.** GuruRMM agent ID: `c39f1de7-d5b6-45ae-b132-e06977ab1713` (re-enrolled; the older `6766e973-...` is stale — **always resolve the agent live by hostname**, never hardcode the UUID) | +| CS-SERVER | 192.168.2.254 | DC, DNS, DHCP (no scopes), File Server, Hyper-V host, Print Server | Windows Server 2019 Standard | Dell PowerEdge R610 (~2009 hardware, 16+ years old). **Single DC — CRITICAL risk. No backup.** GuruRMM agent ID: `c39f1de7-d5b6-45ae-b132-e06977ab1713` (re-enrolled; the older `6766e973-...` is stale — **always resolve the agent live by hostname**, never hardcode the UUID). **OS RAID-1 mirror DEGRADED (2026-06-15) — see hardware warning below.** | | CS-SERVER iDRAC | 192.168.2.65 | Out-of-band management | — | Dell OOB interface | -| CS-QB (Hyper-V VM on CS-SERVER) | 192.168.2.228 | VoIP server | — | [REVIEW — transitioning away from traditional landlines to wireless phones; revisit this entry] | +| CS-QB (Hyper-V VM on CS-SERVER) | 192.168.2.228 | (label "VoIP server" — STALE) | — | **2026-06-16 recon: SMB/445 only, no SIP response — NOT a live SIP PBX.** Phones appear cloud-registered (Vertical). Label predates the wireless-phone transition; revisit/retire. | | cascadesDS (Synology NAS) | 192.168.0.120 | NAS / legacy file storage | DSM | Port 5000 HTTP. Workgroup name is "CASCADES" — same as AD short name, causing Kerberos auth failures from domain-joined machines. Slated to become backup-only. | | pfSense Firewall | 192.168.0.1 | Perimeter firewall, inter-VLAN routing | pfSense 24.0 | Dual-WAN. All DHCP served here (CS-SERVER DHCP role has no scopes). MAC: 00:f1:f5:34:b3:4a | -**[WARNING] CS-SERVER hardware:** Dell R610 with mixed SATA laptop drives (OS array, no hot spare) and enterprise SAS drives from 2015-2016. No backup exists. No second DC. Hardware will fail — DC migration is urgent. +**[CRITICAL] CS-SERVER hardware — RAID degraded (2026-06-15):** Dell R610, basic SAS 6/iR controller (3 Gbps, no cache). The **OS RAID-1 mirror (Virtual Disk2 = C:, holds OS / AD / SQL / page file) is DEGRADED** — Physical Disk 0:0:3 (320 GB WD SATA laptop drive) is Critical/Removed, leaving C: on a single surviving 320 GB Hitachi 5400 RPM spindle with ZERO redundancy. A 1.2 TB SAS disk (1:0:4) sits "Ready" but is the wrong size/type to rebuild the 320 GB mirror, so no auto-rebuild fired. D: is a separate healthy RAID-1 (2x 1.2 TB SAS). The degraded mirror on a slow laptop spindle is the root cause of the "CS-SERVER slow" reports (random-I/O bound). With the single-DC, EOL (16+ yr) posture this is a data-loss emergency — SSD rebuild-then-swap is a valid band-aid (image C: first; enterprise SATA SSD >= 320 GB; no TRIM through this controller) but the DC migration remains the real fix. -**[WARNING] HIPAA violation:** No backup for CS-SERVER (§164.308(a)(7)). Synology Active Backup for Business is blocked (ext4 filesystem, not Btrfs). +**[INFO] Backup — gap now being closed (2026-06-15):** Mike installed ACG cloud backup (MSP360/CloudBerry -> ACG-backup server) on CS-SERVER and started a backup, addressing the longstanding §164.308(a)(7) "no backup" HIPAA gap. (Synology Active Backup for Business remains blocked — ext4, not Btrfs.) Verify the first full completes and set retention. + +**[WARNING] CS-SERVER endpoint-agent sprawl:** CS-SERVER is NOT in the ACG Bitdefender/GravityZone tenant; Defender is replaced by a Syncro-managed "Endpoint Protection Service". The previous MSP's **Datto RMM/CentraStage + Datto EDR/Infocyte** are still installed on top of Syncro + GuruRMM + ScreenConnect + KPAX — overlapping agents thrashing the degraded spindle. Clean up the Datto stack. (Infection sweep 2026-06-15: clean.) ### Email & Identity @@ -186,7 +193,7 @@ Because per-user **Intune** never provisioned tenant-wide (`INTUNE_A = PendingIn - **Firewall:** pfSense 24.0 at 192.168.0.1. All DHCP. Inter-VLAN routing. 236 resident room VLANs (per-room /28, `10.[floor].[room].0/28`). Staff/infra VLAN 20 (`10.0.20.0/24`, gateway `10.0.20.1`). Guest VLAN 50 (`10.0.50.0/24`, RFC1918 blocked). - **Switching:** Full UniFi. **77 U7-Pro APs** + ~9 managed switches (1st Floor USW-48 PoE core; floors 2-4 USW-Pro-24-PoE; MemCare USW-Pro-24-PoE; USW Lite 8 PoE; USW-16-PoE VoIP switch). All managed on the shared UOS controller (172.16.3.29; see [[uos-server]]); Cascades site_id `685f39068e65331c46ef6dd2`. Switch hardware replacement on floors 2/3/4 complete. - **WiFi SSIDs:** - - CSCNet — staff, VLAN 20 + - **CSCNet — shared PPSK SSID (corrected 2026-06-16; NOT a simple staff/VLAN-20 SSID).** `private_preshared_keys_enabled`; ~230 per-key->network mappings (most keys -> per-room resident VLANs 101-631; a few -> Default; one phone key -> Internal/VLAN 20). ~1,190 historical clients (residents' IoT/TVs, staff, phones). **Do NOT repoint the SSID to move a subset of clients** — move at the PPSK level (add a dedicated key for the target network). wlanconf `685f39078e65331c46ef7ee5`; cred vault `clients/cascades-tucson/wifi-cscnet.sops.yaml`. - CSC ENT — legacy SSID, main LAN (192.168.0.0/22), being deprecated as migration proceeds - Guest — isolated, VLAN 50 - **Wireless RF status (live audit 2026-06-15 — ~574 concurrent clients):** @@ -197,7 +204,8 @@ Because per-user **Intune** never provisioned tenant-wide (`INTUNE_A = PendingIn - **Config flags (remediation pending):** 6 APs have 2.4 min-RSSI OFF (615, 608, 505, 517, 622, salon); 4 APs off the 1/6/11 channel plan on auto (128, 108, 108U7 Pro, salon); 2.4 TX power auto on ~75 radios. - **Known hardware:** AP 108 (Floor 1) offline pending a new cable run (expected); stale duplicate controller object ("108" vs "108U7 Pro") to clean up. - **Creds (vault refs only):** `infrastructure/uos-server-ssh-key` (SSH/Mongo access), `infrastructure/uos-server-network-api-rw` (RW controller admin), `clients/cascades-tucson/unifi-ap-ssh` (per-AP device auth via site VPN). -- **VoIP:** AudioCodes phones (8 units) on USW-16-PoE. CS-QB VM at 192.168.2.228. Not MSP-managed but infra must stay static. +- **VoIP (vendor: Vertical — Richard Turner ):** Two phone fleets — **8 AudioCodes** (OUI `00:90:8f`, WIRED on USW-16-PoE ports 1-8, Default/main LAN) and **22 Poly** (OUI `48:25:67`, WiFi via CSCNet PPSK -> VLAN 20 Internal). The **Vertical-Remote management desktop** (`192.168.2.180`, MAC `e4:e7:49:52:3a:06`, WIRED USW-16-PoE port 16, Default LAN, **static IP, no ACG login**) is RDP-only (recon 2026-06-16 — not a PBX). No on-prem SIP PBX found -> phones appear to register to a **cloud/hosted PBX** (Vertical). Infra must stay static. +- **[PLANNED] Voice VLAN (VLAN 30) consolidation for the phones:** Segmentation left voice gear split (Poly on VLAN 20; AudioCodes + Vertical desktop on the main LAN), and main-LAN -> VLAN 20 is blocked at pfSense — so the desktop can't reach the wireless phones and phone IPs drift. Fix: a dedicated isolated **VLAN 30 VOICE (`10.0.30.0/24`, gw `10.0.30.1`, pfSense igc1.30)** holding ALL phones + the Vertical desktop; internet egress allowed, firewalled off VLAN 20 / main LAN / PHI (HIPAA); Vertical's pfSense OpenVPN scoped to `10.0.30.0/24` via a Client-Specific-Override. Desktop is static + no ACG login -> Vertical sets it to DHCP (or grants temp access) at cutover; reserve `10.0.30.10`. Status: PLANNED — vendor email sent 2026-06-16, awaiting Richard's confirm (cloud-PBX, desktop static, VPN cert CN) + a window. **Full runbook + recon: `clients/cascades-tucson/docs/network/voice-vlan-cutover.md`.** ### External Vendors & Mail Senders @@ -409,6 +417,9 @@ Primary active project as of 2026-05-24: dept-by-dept domain migration (Syncro # - Edge UNC download bug (Chromium 149): decide fix path for Ashley Jensen + Lois Lane and fleet (see Patterns -> Browser / Edge); no fix applied as of 2026-06-08 - ALIS app session timeout: lower from 20 to 15 min (Howard, ALIS admin) — PENDING - **Wireless RF tuning (staged, no changes applied as of 2026-06-15):** 2.4 GHz TX power → Low per-zone (Floor 4 pilot first; live cu_total + retry% before/after validation); 6 GHz band-steering for capable clients; 5 GHz 80→40 MHz + non-DFS channel plan (UNII-1+UNII-3); min data rates; min-RSSI + channel-plan fixes on 6 flagged APs. Gated: build AP-to-AP RF-neighbor table before any AP disables; pull radar-detection event history to confirm DFS avoidance need; site VPN `.ovpn` needed for `watch-ap.sh` live stream (pfSense OpenVPN Client Export). +- **[CRITICAL] CS-SERVER degraded RAID-1 (2026-06-15):** OS mirror (C:) running on a single 320 GB laptop spindle, no redundancy — root cause of "server slow". Plan SSD rebuild-then-swap (image C: first); DC migration is the real fix. Cloud backup now installed/started — verify first full completes + set retention. +- **[CLEANUP] CS-SERVER agent sprawl:** remove the previous MSP's leftover Datto RMM + Datto EDR/Infocyte stack (thrashing the degraded disk atop Syncro/GuruRMM/ScreenConnect/KPAX). +- **[PLANNED] Voice VLAN (VLAN 30) for Vertical phones + remote desktop:** vendor email sent 2026-06-16; awaiting Richard's confirm (cloud-PBX, desktop static, VPN cert CN) + maintenance window, then execute. Runbook: `clients/cascades-tucson/docs/network/voice-vlan-cutover.md`. --- @@ -447,6 +458,8 @@ Primary active project as of 2026-05-24: dept-by-dept domain migration (Syncro # | 2026-06-10 | **Meredith Kuhn locked Word doc — stale owner files on cascadesDS.** Five orphaned Word `~$` owner files dated 2024 in `\\cascadesds\Public\Company Web Docs\Staff Trainings\` caused false "locked for editing" messages on training documents with no active session. Diagnosed and deleted all 5 via RMM in Meredith's `user_session` on ASSISTMAN-PC (agent `cf86fa5e`) — CS-SERVER SYSTEM cannot authenticate to cascadesDS (workgroup/Kerberos mismatch). Howard's post-reboot check on the Synology confirmed no live handles. Ticket #32403 (id 112502876), 0.5h remote, invoice $0.00 prepaid, block 56.75→56.25. | | 2026-06-12 | **Created shared mailboxes grievances@ + Surveys@ and delegated to Meredith & Ashley.** `grievances@cascadestucson.com` and `Surveys@cascadestucson.com` created as SharedMailbox (cloud-only, no license consumed), each delegated to Meredith Kuhn and Ashley Jensen with FullAccess (auto-mapping) + SendAs. Work done via ComputerGuru Exchange Operator MSP app cert auth (EXO module v3.10.0 installed on Howard-Home for this session). All 8 permission grants verified post-creation. Ticket #32417 (id 112597225), 0.5h remote, invoice #1650665832 $0.00 prepaid, block 56.25→55.75; ticket Invoiced. | | 2026-06-15 | **Wireless RF full audit — controller access gained.** Mike vaulted `infrastructure/uos-server-ssh-key` + `clients/cascades-tucson/unifi-ap-ssh`; `unifi-wifi` skill used end-to-end. Live audit via UOS Mongo (Plane 1) confirmed 77 U7-Pro APs, 574 clients, 2.4 GHz saturation as primary pain band (avg retry 11.2%, cu_total 69–94%, catastrophic neighbor density). Accuracy bugs in `live-stats.sh` found and fixed mid-session (15-AP head cap, wrong satisfaction/retry fields) — corrected the data and corrected a mid-session misdiagnosis that DFS was the #1 problem (withdrawn; DFS retry rate 8.4% ≈ non-DFS 9.0%). Mike also vaulted `infrastructure/uos-server-network-api-rw` (RW controller admin) same day; Plane 2 (Network API) re-audited and confirmed findings. DFS designated a resilience concern (near Davis-Monthan AFB + TUS radar), not a throughput concern. 6 GHz (1 client of 574) identified as largest untapped capacity. Tuning plan staged (see Patterns -> Wireless / UniFi RF); no changes applied. | +| 2026-06-15 | **CS-SERVER slowness root-caused to a degraded RAID-1; backup started; OpenVPN password reset.** "CS-SERVER slow / check for infections" -> not RAM/CPU/disk (48 GB RAM ~72% free, 10-day uptime, clean infection sweep). Dell OMSA: PD 0:0:3 (320 GB WD SATA) Critical/Removed, Virtual Disk2 (C: mirror) Non-Critical/Degraded -> C: on a single 320 GB Hitachi 5400 spindle, no redundancy (root cause of slowness); 1.2 TB SAS "Ready" disk is the wrong size to rebuild. Found leftover Datto RMM + Datto EDR/Infocyte; CS-SERVER not in Bitdefender. Mike installed MSP360/CloudBerry cloud backup and started it (closes the no-backup HIPAA gap). Reset Howard's lost pfSense OpenVPN password (local-DB user `Howard`, userid 0) via `local_user_set_password()` PHP-exec driven from CS-SERVER over RMM (CS-SERVER reaches 192.168.0.1:443/22); verified AUTHOK and vaulted. | +| 2026-06-16 | **Voice VLAN plan for Vertical phones (PLANNED, not executed).** Vertical's tech (Richard Turner) couldn't reach phones from the remote desktop (192.168.2.180) and phone IPs drift. UOS controller diagnosis: Poly phones (22, `48:25:67`) on WiFi/CSCNet PPSK -> VLAN 20; AudioCodes (8, `00:90:8f`) wired USW-16-PoE ports 1-8 on Default LAN; Vertical desktop wired port 16 on Default, static, no ACG login. CSCNet found to be a shared PPSK SSID (corrected the old "staff/VLAN 20" note). GuruRMM recon from CS-SERVER: desktop = RDP-only (not a PBX); CS-QB (192.168.2.228) = SMB-only, no SIP -> phones likely cloud PBX. Designed dedicated VLAN 30 VOICE (10.0.30.0/24) for all phones + the desktop (internet-only egress, isolated from VLAN 20/main LAN/PHI, OpenVPN scoped via CSO); wrote the cutover runbook (`docs/network/voice-vlan-cutover.md`); Howard sent the vendor email. Awaiting confirm + window. | ---