wiki: compile cascades-tucson (full) — RF optimization night (2.4 power + data-driven 5GHz DFS, retry halved)
This commit is contained in:
@@ -2,7 +2,7 @@
|
||||
type: client
|
||||
name: cascades-tucson
|
||||
display_name: Cascades of Tucson
|
||||
last_compiled: 2026-06-18
|
||||
last_compiled: 2026-06-19
|
||||
compiled_by: HOWARD-HOME/claude-main
|
||||
sources:
|
||||
- session-logs/2026-03-24-session.md
|
||||
@@ -79,6 +79,11 @@ sources:
|
||||
- clients/cascades-tucson/docs/network/network-optimization-master-plan.md
|
||||
- clients/cascades-tucson/docs/network/phase1-voice-qos-design.md
|
||||
- clients/cascades-tucson/reports/2026-06-18-voice-quality-diagnostic.md
|
||||
- clients/cascades-tucson/session-logs/2026-06/2026-06-18-howard-memcare-baseline-and-change-window.md
|
||||
- clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-2am-rf-run-phase2b-applied.md
|
||||
- clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-5ghz-attempt-and-rollback.md
|
||||
- clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-5ghz-dfs-datadriven-applied.md
|
||||
- clients/cascades-tucson/session-logs/2026-06/2026-06-19-howard-cascades-rf-night-capstone.md
|
||||
backlinks:
|
||||
- projects/gururmm
|
||||
- wiki/systems/uos-server
|
||||
@@ -356,6 +361,22 @@ Cascades' line-of-business / reporting SaaS (the systems they pull data OUT of,
|
||||
|
||||
### Wireless / UniFi RF
|
||||
|
||||
- **[EXECUTED 2026-06-19 -- autonomous 2 AM window, validated] First production RF optimization applied + kept:**
|
||||
- **2.4 power Low/full -> MEDIUM on 47 radios** (the 42 over-thinned `low` floors 1-4 + named, + the 5
|
||||
MemCare floors-5/6 `auto`/full radios 505/517/608/615/622). The 24 thinned-disabled radios stayed
|
||||
disabled; 5 mesh-auto APs untouched. Non-regressive (satisfaction held). Undid the over-thinning
|
||||
regression + brought MemCare off full power. **Per-AP targeting required** -- `apply-radio power --zone`
|
||||
re-enables disabled radios (re-confirmed gotcha).
|
||||
- **5 GHz -> clean DFS 40 MHz channels** on 72 non-mesh APs (channels 52/60/100/108/116/124/132/140),
|
||||
0 co-channel, mesh excluded (2nd Floor Atrium + children CC Bridge/salon/108 left on auto). **Result:
|
||||
5 GHz retry roughly HALVED -- 8.7 -> 3.8 avg, median 8.2 -> 2.1.** Validated; all 72 APs holding DFS,
|
||||
0 radar vacates. Voice nudged back to 5 GHz (kick-sta) after the channel-change scatter.
|
||||
- **CSCNet BSS-transition (802.11v) ON.** 6 GHz still BLOCKED (WPA3 -- see below).
|
||||
- **[BIG LESSON -- non-DFS decision REVERSED by data]** A blind non-DFS reshuffle was tried first and
|
||||
FAILED (flat retry); the completed channel survey (74/74 APs) proved **DFS channels here are 4-5x
|
||||
cleaner (2-3% busy) than non-DFS (ch149=12%, ch157=28%, ch44=22% -- the property's worst).** Consumer/
|
||||
neighbor gear avoids DFS. Choosing channels from the measured scan (not a non-DFS policy) is what
|
||||
delivered the win. **Always: scan -> `survey-report.py` -> `channel-plan --channels` -> apply -> validate.**
|
||||
- **Fleet (full audit 2026-06-16):** 77 U7-Pro APs, **12 switches**, ~587 wireless clients. Controller: UOS at 172.16.3.29, HTTPS 11443 (see [[uos-server]]); site short name `va6iba3v`, site_id `685f39068e65331c46ef6dd2`. No UniFi gateway (pfSense is the gateway). pfSense ruled out as WiFi factor 2026-06-16 (DHCP not exhausted, DNS up, WAN stable -- see Network section).
|
||||
- **Primary pain band is 2.4 GHz.** Avg TX-retry ~10%; cu_total 69-94% live; catastrophic neighbor BSSID density (ch6 ~33k BSSIDs, ch1 ~19k, ch11 ~17k). 27 of the 40 worst clients stuck on 2.4 GHz (retry 11-42%), mostly IoT/legacy hardware (Ring cameras, robotic cleaner, smart plugs, EPSON printer, Poly phone, handheld scanners, smartwatch). Root cause: ~75 2.4 GHz radios running at auto (full) TX power in extreme density. Experience splits by band: 5/6 GHz clients are fine; clients that land or stick on 2.4 GHz suffer.
|
||||
- **5 GHz -- DFS concern is theoretical; empirically clean.** 76/77 radios on 80 MHz width (should be 40 MHz at this density). 55/77 radios on DFS channels (52-144) near Davis-Monthan AFB + TUS airport radar. `dfs-check.sh` 2026-06-16: **ZERO real radar events fleet-wide** (55 DFS APs, full `dmesg` sweep, precise pattern match) -- DFS is empirically low-risk here. Measured TX-retry DFS (8.4%) ~= non-DFS (9.0%) -- no throughput penalty. Still recommended to move to non-DFS (UNII-1 36-48 + UNII-3 149-161) for resilience. NOTE: an earlier mid-session claim (2026-06-15 audit) that "DFS was the #1 problem" was an artifact of tooling bugs (raw counter + 15-AP head cap) and was corrected before session end -- do not repeat it.
|
||||
@@ -376,7 +397,8 @@ Cascades' line-of-business / reporting SaaS (the systems they pull data OUT of,
|
||||
- **AP-hang recovery:** use `device-control.sh cascades poe-cycle "<AP name>" --apply` (remote PoE port cycle via controller cmd/devmgr). Do NOT use `force-provision` -- it took AP 445 offline during the Floor-4 pilot and was removed from device-control.sh.
|
||||
- **Tooling (`unifi-wifi` skill -- feature-complete as of 2026-06-16):**
|
||||
- Collectors: `audit-site.sh` (config + neighbor density), `live-stats.sh` (live per-AP/client, Plane 2), `model-rank.sh`, `radio-usage.sh` (77-day 2.4 usage history per AP; confirms POWER-DOWN vs disable), `coverage-thin.sh` (mesh-aware 2.4 SNR dominating-set -- drives Phase C), `neighbor-collect.sh` (/proc/ui_neighbor AP-to-AP SNR matrix, non-disruptive, drives optimize-radios disables), `survey-collect.sh` (per-channel busy%/noise -> channel plan), `dfs-check.sh` (precise per-AP radar event history), `switch-audit.sh`, `gw-audit.sh`, `monitor-run.sh` (cron health digest, all sites), `sites.sh` (multi-client site list, ~49 UOS sites).
|
||||
- Apply (gated + rollback): `apply-radio.sh` (power/width/channel/minrssi/disable/enable, --zone/--ap), `apply-wlan.sh` (minrate/bandsteer/bands/steer/bsstm/dtim/isolation/etc.), `client-control.sh` (block/unblock/kick MAC), `device-control.sh` (poe-cycle; adopt/restart/locate/upgrade), `channel-plan.sh` (data-driven 2.4/5 GHz channel plan via neighbor + survey data).
|
||||
- **`survey-report.py` (NEW 2026-06-19) -- the channel-decision driver:** rolls the `survey-collect` JSON into the fleet per-channel/per-band-group measured busy% table + cleanest/dirtiest ranking + a suggested clean 40MHz palette. Run it BEFORE any channel change; it's what makes the DFS-vs-non-DFS call from facts (the skill previously had a non-DFS bias baked into `survey-collect`'s report AND `channel-plan`'s palette -- both fixed 2026-06-19).
|
||||
- Apply (gated + rollback): `apply-radio.sh` (power/width/channel/minrssi/disable/enable, --zone/--ap), `apply-wlan.sh` (minrate/bandsteer/bands/steer/bsstm/dtim/isolation/etc.), `client-control.sh` (block/unblock/kick MAC -- used to nudge sticky phones off 2.4 after a channel change), `device-control.sh` (poe-cycle; adopt/restart/locate/upgrade), **`channel-plan.sh` (now DATA-DRIVEN palette: `--channels <list>` or `--dfs ok|avoid|only`; default ranks ALL 40MHz primaries by measured busy%; load-balance + local-search -> 0 strong co-channel).**
|
||||
- pfSense: `pfsense-ssh.sh` (audit/dhcp/run -- SSH backend, no RESTAPI package needed; auth from `clients/<slug>/pfsense-firewall`; system OpenSSH via askpass). ROADMAP: gated control verbs (firewall rules, port forwards) -- deferred to Mike per SS E.
|
||||
- All scripts site-parameterized (work for any of ~49 UOS sites). Per-client AP-side creds via `clients/<slug>/unifi-ap-ssh`.
|
||||
- **Creds (vault refs only):** `infrastructure/uos-server-ssh-key` (SSH/Mongo), `infrastructure/uos-server-network-api-rw` (RW API), `clients/cascades-tucson/unifi-ap-ssh` (per-AP SSH, needs site VPN for L3 reach to 192.168.2.x/3.x), `clients/cascades-tucson/pfsense-firewall` (pfSense admin for pfsense-ssh.sh).
|
||||
@@ -409,7 +431,7 @@ Full plan: `docs/network/network-optimization-master-plan.md`. Goal: fix the *sy
|
||||
|
||||
### Decisions resolved 2026-06-18 (voice/RF)
|
||||
|
||||
- **5 GHz: NON-DFS ONLY** (UNII-1 36-48 + UNII-3 149-165). A precise radar sweep found ZERO genuine hits across all 53 DFS APs -- but only over a ~21-23h window (APs rebooted in the 6/17 outage), and near Davis-Monthan AFB + TUS (~10 mi) a single sporadic military-radar hit forces a 30-min channel vacate = **dropped calls**. Resilience > channel diversity for a voice-critical net; 6 GHz (Phase 2a) covers the lost capacity. Add periodic `dfs-check.sh` monitoring. (Supersedes the earlier "move to non-DFS for resilience" as now a firm decision.)
|
||||
- **5 GHz: USE THE CLEAN DFS CHANNELS** (REVERSED 2026-06-19 by measured data; the prior "non-DFS only" call was wrong for THIS site). The full channel survey (74/74 APs) showed DFS = 2-3% busy vs non-DFS 10-28% (ch149/157 are the worst on property). The everyday, every-call congestion on non-DFS is real and measured; the radar risk is hypothetical (0 genuine hits observed). So Howard chose the clean DFS channels (52/60/100/108/116/124/132/140) for voice quality. Safety net: UniFi auto-vacates a DFS channel on radar (regulatory -- moves ONE AP, not the fleet); FOLLOW-UP = stand up a recurring `dfs-check.sh` radar monitor. (This supersedes the 2026-06-18 "non-DFS only" decision -- which was made before the per-channel scan existed.)
|
||||
- **NO dedicated voice SSID** -- voice stays on the shared CSCNet PPSK. UniFi 3-SSID cap is sound RF hygiene (each SSID = beacon airtime at 77 APs); the only retirement candidate CSC ENT still has 131 active clients (staff PCs, printers, DirecTV) so a slot isn't free; and a voice SSID isn't needed (QoS is VLAN/DSCP-based and SSID-independent, band preference is best set phone-side via Vertical, roaming/power-save are phone+AP settings). Revisit only if CSC ENT's clients migrate off.
|
||||
|
||||
### pfSense Operations
|
||||
@@ -559,6 +581,7 @@ Syncro live pull 2026-06-18: **0 open tickets.** No hours drawn from the 2026-06
|
||||
| 2026-06-18 | **DESKTOP-TRCIEJA (Lupe Sanchez) performance diagnosed; replace-not-remediate decision.** Root causes: (a) EOL hardware -- Gateway ZX6971 AIO, Intel i3-2120 (2011, 2C/4T), 8 GB RAM, Win11 unsupported; (b) dual real-time AV -- ACG Bitdefender (keep) + leftover Datto stack (Datto RMM/CentraStage + Datto EDR/Infocyte + bundled DattoAV) both scanning every file on a 2-core CPU under memory pressure. OneDrive ruled out (desktop is local). Howard decided: no remediation; order replacement. Another instance of the fleet-wide leftover-Datto-stack cleanup. |
|
||||
| 2026-06-18 | **Voice VLAN 30: all 22 Poly phones migrated; network-logging spec written.** Completed the Poly cutover live -- all 22 WiFi phones re-keyed to the voice PPSK onto `10.0.30.202-.223` (per-phone location inventory in `docs/network/voice-phone-inventory.md`); first phone (Lauren Hasselman) dial-tone + outbound call verified. Vertical desktop fixed via port-16 bounce (controller API + CSRF) -> `10.0.30.201`. AudioCodes (8, wired) still pending (flip + PoE power-cycle). Separately, found the UniFi controller retains **ZERO** client events for Cascades (drop/kick history not captured) -> wrote a network-logging spec (`docs/network/network-logging-plan.md`): Synology Log Center on-site collector, pfSense+UniFi syslog sources, client snapshotter. Plan only -- build later. |
|
||||
| 2026-06-18 | **Voice VLAN 30 cutover COMPLETE (8 AudioCodes added); voice-quality diagnosed; holistic all-device optimization master plan built.** AudioCodes finished -- they wouldn't re-DHCP via PoE/controller bounce (externally powered, PoE off); Howard physically power-cycled all 8 -> VOICE leases `.224-.231` (31 devices total on VLAN 30). Diagnosed the dropped-calls complaints: **the VLAN move does NOT fix call quality -- it's RF on the Poly WiFi phones** (wired AudioCodes clean). 14 Poly flagged; worst Lauren `.202` (2.4GHz/50% retry -> locked to AP 103) + Shelby `.218` (2.4GHz/53%, MemCare/deferred); coverage gaps rooms 515/210/204; found 6 unmigrated Poly stragglers (fleet is 28, not 22). Built `network-optimization-master-plan.md` (open-relief-valves-before-constraining sequence: QoS -> 6 GHz on CSCNet + 2.4 Low->Medium -> 5 GHz 40 MHz/non-DFS/relieve AP 103 -> fine-tune -> physical) with interdependency map + data-driven gate framework, floors 1-4 only. Designed Phase 1 voice QoS (`phase1-voice-qos-design.md`: pfSense HFSC + UniFi WMM, match `10.0.30.0/24`, phones mark DSCP EF; measured WAN1 up ~522 Mbps -> QoS is insurance, RF is the substance). Rigorous DFS re-verification (0 genuine radar/~1-day window) -> **decision: NON-DFS only**. **Decision: no dedicated voice SSID** (3-SSID cap, CSC ENT still 131 clients, QoS is SSID-independent). 6 GHz root-caused dark: CSCNet not broadcasting 6g. NO live network changes applied (per-change-go rule). |
|
||||
| 2026-06-19 | **FIRST PRODUCTION RF OPTIMIZATION applied (autonomous 2 AM window) -- 2.4 power fix + data-driven 5 GHz DFS plan; 5 GHz retry HALVED.** Howard pre-authorized an autonomous 2 AM run. Applied + validated + KEPT: (1) **2.4 power Low/full -> MEDIUM on 47 radios** (over-thinning fix floors 1-4 + MemCare 5/6 off full power; 24 disabled stayed disabled; per-AP targeting since `--zone` re-enables disabled), non-regressive. (2) **CSCNet BSS-transition ON.** 6 GHz attempted but **BLOCKED -- `Wpa3MandatoryFor6GHzBand`** (CSCNet is WPA2/PPSK; converting the 427-client SSID is a supervised decision, deferred to Howard). A first blind non-DFS 5 GHz reshuffle (3a/3b) was tried, did NOT validate (flat retry, voice scattered to 2.4), and was ROLLED BACK. **Howard's correction: scan FIRST, decide from data.** Completed the full channel survey (74/74) -> proved **DFS channels here are 4-5x cleaner (2-3% busy) than non-DFS (ch149=12%, ch157=28%)**; the non-DFS-only decision was reversed. Built a **data-driven clean-DFS plan** (8 clean DFS 40MHz channels, per-AP cleanest + neighbor graph-color + local-search -> 0 co-channel), applied to 72 non-mesh APs (mesh excluded), nudged voice back to 5 GHz. **Result: 5 GHz retry 8.7 -> 3.8 avg (median 8.2 -> 2.1), satisfaction median 99, voice 31/31, all 72 APs holding DFS, 0 radar vacates.** Also disabled the 3 AM AP auto-upgrade (left OFF). **Skill hardened:** added `survey-report.py` (fleet channel-congestion analysis) + made `channel-plan.sh` palette data-driven (`--channels`/`--dfs`, load-balance + local-search) -- killed the non-DFS bias that caused the first failed attempt. |
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -18,7 +18,7 @@ Run `/wiki-lint` to check for stale entries and broken backlinks.
|
||||
|
||||
| Article | Summary | Last Compiled |
|
||||
|---|---|---|
|
||||
| [Cascades of Tucson](clients/cascades-tucson.md) | Prepaid block $175/hr, **55.75 hrs remaining** (live 2026-06-18); senior living; active domain migration + HIPAA compliance project; single DC on aging R610 hardware; caregiver restricted-access model PROVEN 2026-06-05: Hybrid Entra Join + CA allow-list + ALIS SSO validated on NURSESTATION-PC/pilot.test; GPO `CSC - Caregiver Workstation` (shortcuts + printers) built + validated; GPO `CSC - Caregiver Device Lockdown` deployed (HIPAA auto-logoff, activates on reboot); INTUNE_A PendingInput tenant-wide (MS case open; GPO path used instead); folder-redirection root cause fixed 2026-06-08 (fdeploy.ini); shared mailboxes grievances@/Surveys@ created + delegated 2026-06-12 (#32417); Monday cutover to real caregivers pending; #32383 (bill.com/BOK chris.knight) Resolved; UniFi wifi RF (77 U7-Pro APs/~587 clients via UOS controller): 2.4GHz over-coverage = primary pain; pfSense ruled out as cause; Floor-4 power-down pilot applied 2026-06-16 (retry 13.2->9.5%); coverage-thin disable plan + 2.4 remediation runbook staged; DFS empirically clean; 6GHz untapped; CS-SERVER OS RAID-1 degraded 2026-06-15 (data-loss risk; cloud backup now started); Voice VLAN (VLAN 30) consolidation planned 2026-06-16 for Vertical phones + remote desktop (CSCNet confirmed a shared PPSK SSID); KPI dashboard for Ashley Jensen scoped 2026-06-17 (Power BI + SharePoint phased plan, parked); Voice VLAN 30 built + 22/22 Poly cut over 2026-06-17 (AudioCodes 0/8 pending); building power outage 2026-06-17 (pfSense on UPS surge-only side) full site down + recovered; DESKTOP-TRCIEJA (Lupe Sanchez) slow Excel diagnosed 2026-06-18 = EOL i3-2120 hardware + dual real-time AV (leftover Datto stack) -> replace machine; network-logging spec written 2026-06-18 (on-site Synology Log Center; UniFi retains 0 client events -- drop/kick history not captured); Syncro 0 open tickets | 2026-06-18 |
|
||||
| [Cascades of Tucson](clients/cascades-tucson.md) | Prepaid block $175/hr, **55.75 hrs remaining** (live 2026-06-18); senior living; active domain migration + HIPAA compliance project; single DC on aging R610 hardware; caregiver restricted-access model PROVEN 2026-06-05: Hybrid Entra Join + CA allow-list + ALIS SSO validated on NURSESTATION-PC/pilot.test; GPO `CSC - Caregiver Workstation` (shortcuts + printers) built + validated; GPO `CSC - Caregiver Device Lockdown` deployed (HIPAA auto-logoff, activates on reboot); INTUNE_A PendingInput tenant-wide (MS case open; GPO path used instead); folder-redirection root cause fixed 2026-06-08 (fdeploy.ini); shared mailboxes grievances@/Surveys@ created + delegated 2026-06-12 (#32417); Monday cutover to real caregivers pending; #32383 (bill.com/BOK chris.knight) Resolved; UniFi wifi RF (77 U7-Pro APs/~587 clients via UOS controller): 2.4GHz over-coverage = primary pain; pfSense ruled out as cause; Floor-4 power-down pilot applied 2026-06-16 (retry 13.2->9.5%); coverage-thin disable plan + 2.4 remediation runbook staged; DFS empirically clean; 6GHz untapped; CS-SERVER OS RAID-1 degraded 2026-06-15 (data-loss risk; cloud backup now started); Voice VLAN (VLAN 30) consolidation planned 2026-06-16 for Vertical phones + remote desktop (CSCNet confirmed a shared PPSK SSID); KPI dashboard for Ashley Jensen scoped 2026-06-17 (Power BI + SharePoint phased plan, parked); Voice VLAN 30 built + 22/22 Poly cut over 2026-06-17 (AudioCodes 0/8 pending); building power outage 2026-06-17 (pfSense on UPS surge-only side) full site down + recovered; DESKTOP-TRCIEJA (Lupe Sanchez) slow Excel diagnosed 2026-06-18 = EOL i3-2120 hardware + dual real-time AV (leftover Datto stack) -> replace machine; network-logging spec written 2026-06-18 (on-site Synology Log Center; UniFi retains 0 client events -- drop/kick history not captured); **RF optimized 2026-06-19** (2.4 power Low/full->Medium + 5GHz moved to clean DFS channels via data-driven scan -> 5GHz retry halved; 6GHz blocked by WPA3); Syncro 0 open tickets | 2026-06-19 |
|
||||
| [Dataforth Corporation](clients/dataforth.md) | Prepaid block ~$2,099/mo, 34.5 hrs remaining; signal conditioning manufacturer; 64 DOS test stations; 2025 crypto attack recovery + incomplete restore (files dropped across shares — migration-gap audit in progress); 2026-03-27 phishing incident + MFA rollout; active test datasheet pipeline project; Neptune Exchange colocated at D2; 2026-06-04 SP1366 file recovery (19/20 PDFs restored from HGHAUBNER pre-attack backup); GuruRMM fleet 13→45 agents; 2026-06-02 Syncro asset reconciliation (78→20 keep/21 flag/28 remove/9 verify); fleet-wide Syncro agent break ~2025-10-06; Bitdefender phase-off in progress | 2026-06-04 |
|
||||
| [Instrumental Music Center](clients/instrumental-music-center.md) | Prepaid block $175/hr, 12.5 hrs remaining; music retail/repair; AIMsi POS on SQL Server 2019; phantom DC causing slow logons; GuruRMM enrolled (IMC1) | 2026-05-24 |
|
||||
| [Valley Wide Plastering](clients/valleywide.md) | Prepaid block, 10 hrs remaining; plastering/stucco contractor; HP DL360 Gen10 + XenServer; VB6 app modernization project; RDWeb brute-force incident; 11 Yealink phones pending | 2026-06-14 |
|
||||
|
||||
Reference in New Issue
Block a user