cascades: recover 4 docs dropped by the history-rewrite/repo-split
The 2026-06-18 repo restructure (history rewrite + project->submodule split) dropped these 4 Cascades files from the new clone. Copied byte-identical from the pre-cutover claudetools.old clone (md5-verified): - docs/network/network-optimization-master-plan.md - docs/network/phase1-voice-qos-design.md - reports/2026-06-18-voice-quality-diagnostic.md - session-logs/2026-06/2026-06-18-howard-cascades-rf-voice-optimization-plan.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,197 @@
|
||||
# Cascades — Network Optimization Master Plan (all devices, holistic)
|
||||
|
||||
- **Created:** 2026-06-18 (Howard-Home / claude-main)
|
||||
- **Status:** PLAN — for execution tonight (floors 1–4) per Howard. Floors 5 & 6 (MemCare) EXCLUDED this round.
|
||||
- **Goal:** Fix the *system*, not one device at a time. Improve quality for **every** client (~587), not just
|
||||
the 31 voice devices, by sequencing AP + WLAN + QoS + firewall changes so we don't trade one problem for another.
|
||||
- **Builds on:** `reports/2026-06-16-unifi-full-audit.md`, `reports/2026-06-16-2.4ghz-remediation-runbook.md`
|
||||
(RF mechanics + gated apply commands), `reports/2026-06-18-voice-quality-diagnostic.md`, and the live
|
||||
2026-06-18 fleet sample. All RF changes use the gated `unifi-wifi` scripts (per-zone, dry-run, rollback JSON).
|
||||
|
||||
---
|
||||
|
||||
## 1. Current state (what's actually true right now)
|
||||
|
||||
| Layer | State | Verdict |
|
||||
|---|---|---|
|
||||
| **2.4 GHz** | **OVER-THINNED.** Overnight 6/17: 24 radios disabled + 42 set Low (~6 dBm). Interference dropped (cu_interf 64→32–48%) BUT **retry rose 17→23.4%, satisfaction fell 39→30** (time-of-day-controlled). Edge clients now reach farther/weaker APs. Mesh + Floors 5/6 untouched (full 23 dBm). | **Regressed — must correct power floor** |
|
||||
| **5 GHz** | 80 MHz width on ~76/77 (too wide for the density). 55/77 on DFS (empirically clean — 0 radar). Channels biased to busy upper (149/157). **AP 103 saturated: ch149, 75% airtime, ~25,900 retries, 12 clients** (and Lauren's phone now locked there). Dining/Rec Room high retry (810/1083). | **Constrain width + spread channels + relieve hotspots** |
|
||||
| **6 GHz** | 75 radios live, **~1 client.** Root cause: **CSCNet not broadcasting 6 GHz** (`wlan_bands=[2g,5g]`). Cleanest untapped capacity. | **Open it — the relief valve** |
|
||||
| **QoS** | **NONE.** Voice now isolated on VLAN 30 but not prioritized — voice packets compete with data under load → jitter/breaks. | **Add — guaranteed win, now possible** |
|
||||
| **pfSense/WAN/DHCP/DNS** | Healthy; ruled out as a WiFi factor (2026-06-16). Dual-WAN stable, DHCP 53% pool, unbound up. | **Fine — add voice QoS shaping only** |
|
||||
| **Switching / physical** | ~25 ports linked 100 M but gig-capable (caps some AP uplinks); 3 offline switches; AP 108 cable pending; p38 4% tx-drop. | **Physical work — not tonight, but tracked** |
|
||||
|
||||
---
|
||||
|
||||
## 2. Root-cause model (why "some devices" are bad)
|
||||
|
||||
Three compounding RF causes, plus a missing QoS layer:
|
||||
1. **2.4 GHz contention** — extreme neighbor density (ch6 ~33k BSSIDs). Any client that lands/sticks on 2.4 GHz
|
||||
suffers. Made *worse* by the over-thinning (weaker signal → more retransmits).
|
||||
2. **5 GHz over-width + hotspots** — 80 MHz halves the usable channel count → co-channel overlap → retries;
|
||||
a few APs (103) are simply overloaded.
|
||||
3. **6 GHz unused** — the clean band that should absorb modern clients is dark, so everything piles onto 5 GHz.
|
||||
4. **No voice prioritization** — even with perfect RF, voice breaks under data bursts without QoS.
|
||||
|
||||
**The trap we must avoid (the "whack-a-mole"):** narrowing 5 GHz to 40 MHz *without* first opening 6 GHz pushes
|
||||
more clients onto fewer 5 GHz channels → congestion moves, not improves. And dropping 2.4 power further (it's
|
||||
already too low) starves edge clients. **Sequence matters.**
|
||||
|
||||
---
|
||||
|
||||
## 3. The holistic sequence (open relief valves BEFORE constraining)
|
||||
|
||||
> Principle: **(A) add capacity/priority that can't hurt → (B) fix the regression → (C) then constrain/optimize
|
||||
> → (D) fine-tune → validate at every gate.** Each step is reversible; gate on live metrics before the next.
|
||||
|
||||
### PHASE 0 — Pre-flight + baseline (always)
|
||||
- VPN up; `live-stats.sh cascades | head -3` (expect 77 APs).
|
||||
- Baseline (compare after, same time-of-day): `live-stats.sh cascades > .claude/tmp/opt-pre.txt`;
|
||||
`radio-usage.sh cascades ng 77 > .claude/tmp/usage-pre.txt`.
|
||||
- Pick a watch AP per floor (`watch-ap.sh <ip>`).
|
||||
|
||||
### PHASE 1 — QoS for voice (orthogonal, lowest risk — but INSURANCE, not the everyday fix)
|
||||
Voice VLAN 30 is isolated → mark + prioritize it end-to-end so calls beat data under load.
|
||||
> **Reframe (measured 2026-06-18):** WAN1 fiber upload is **~522 Mbps** vs ~98 Mbps peak usage — huge
|
||||
> headroom, so the WAN is **not** the day-to-day voice bottleneck (that's RF, Phases 2–4). QoS still earns its
|
||||
> place as insurance for **WAN2 (coax) failover** and **rare WAN1 saturation** (you hit 680 Mbps down). Build
|
||||
> it (cheap, correct), but don't expect it to fix the complaints — the RF work does. Full design:
|
||||
> `docs/network/phase1-voice-qos-design.md`. **Phones confirmed marking DSCP EF** → rely on DSCP; subnet match is the net.
|
||||
- **UniFi (WLAN/switch):** ensure WMM/QoS on; the AudioCodes/Poly tag voice DSCP — trust/honor it. On the
|
||||
USW, voice VLAN traffic should hit the high-priority queue.
|
||||
- **pfSense:** add a traffic-shaper/limiter or floating QoS rule that puts `VOICE net (10.0.30.0/24)` DSCP EF
|
||||
(46) / RTP UDP into a priority queue on the WAN(s). Low risk — additive, voice-only.
|
||||
- **Validate:** place test calls during a data-heavy moment; confirm no breakup. (No RF change here.)
|
||||
- *Skill gap:* the `unifi-wifi` skill has no QoS verb — this is a pfSense + UniFi config task; consider a small
|
||||
`voice-qos` helper later.
|
||||
|
||||
### PHASE 2 — Open the relief valves (capacity + correct the regression)
|
||||
**2a. Enable 6 GHz on CSCNet + steering** (creates the offload path BEFORE we narrow 5 GHz):
|
||||
```
|
||||
apply-wlan.sh cascades bands all --wlan CSCNet --apply # -> [2g,5g,6g]
|
||||
apply-wlan.sh cascades bsstm on --wlan CSCNet --apply # 802.11v BSS-transition (assists up-band + roam)
|
||||
```
|
||||
Band-steering (`no2ghz_oui`) already ON. 6E/7 clients gravitate to clean 6 GHz, offloading 5 GHz. Validate:
|
||||
client mix shifts toward 6g; no SSID-visibility loss for legacy (2.4/5 stay on).
|
||||
|
||||
**2b. Correct the 2.4 over-thinning — Low → MEDIUM on kept radios** (restores edge signal; keeps cells smaller
|
||||
than full power). Per floor, dry-run then apply; regenerate the kept-radio list live:
|
||||
```
|
||||
for z in "Floor 1" "Floor 2" "Floor 3" "Floor 4"; do \
|
||||
apply-radio.sh cascades ng power medium --zone "$z" --apply; done # ~12–15 dBm
|
||||
```
|
||||
Do NOT expand disables. If a specific area shows a dead zone/complaint, re-enable that one radio
|
||||
(`ng enable --ap "<name>"`). **Gate:** re-measure retry%/satisfaction same time-of-day vs `opt-pre.txt` —
|
||||
expect retry back down from ~23% and satisfaction recovering.
|
||||
|
||||
### PHASE 3 — Constrain + optimize 5 GHz (now that 6 GHz absorbs load)
|
||||
**3a. Width 80 → 40 MHz** (doubles non-overlapping channels → spatial reuse):
|
||||
```
|
||||
for z in "Floor 3" "Floor 1" "Floor 2" "Floor 4"; do \
|
||||
apply-radio.sh cascades na width 40 --zone "$z" --apply; done # rollback: na width 80
|
||||
```
|
||||
**3b. Channel plan — NON-DFS ONLY (decided 2026-06-18 after rigorous DFS verification).**
|
||||
Use **UNII-1 (36–48) + UNII-3 (149–165) only**; do NOT use DFS channels (52–144) on this voice-critical
|
||||
network. A precise radar-detection sweep (real `radar found`/`NOL` signatures, CAC/control housekeeping
|
||||
excluded) found **ZERO genuine hits across all 53 DFS APs** — BUT the window was only ~21–23h (APs rebooted
|
||||
~23h ago, the 6/17 outage). Near Davis-Monthan AFB + TUS (~10 mi), military radar is sporadic and a single
|
||||
hit forces a 30-min channel vacate = **dropped calls** — unacceptable for voice. **Resilience > diversity.**
|
||||
The lost 5 GHz channel count is covered by **6 GHz (Phase 2a) absorbing capacity** — this is WHY 6 GHz comes first.
|
||||
```
|
||||
SURVEY=.claude/tmp/cascades-survey.json; SURVEY_JSON=$SURVEY survey-collect.sh cascades
|
||||
SURVEY_JSON=$SURVEY channel-plan.sh cascades na # dry-run; CONSTRAIN to non-DFS (36-48,149-165); review; apply per zone
|
||||
```
|
||||
**Periodic DFS monitoring:** the ~1-day window isn't conclusive, so add a recurring precise `dfs-check.sh`
|
||||
(fold into the network-logging plan). Staying on non-DFS means a future hit can't affect us; the monitor just
|
||||
confirms the choice stays right.
|
||||
**3c. Relieve AP 103 specifically** (it now carries Lauren + 11 others on a 75%-busy ch149): move it off 149 to
|
||||
a clean channel from the plan, 40 MHz. Verify Lauren `.202` retry drops after.
|
||||
**Gate:** 5 GHz retry down on the busy APs; AP 103 cu_total well under 50%; no client stranded.
|
||||
|
||||
### PHASE 4 — Fine-tune (after 1–3 settle)
|
||||
- **2.4 channel plan 1/6/11** (graph-color; co-channel pairs 92→35) + **pin the 4 off-plan APs** (128/108/108U7/salon)
|
||||
to 1/6/11.
|
||||
- **2.4 min-RSSI ON** for the 6 APs where it's OFF (615/608/505/517/622/salon) — *note 505/517/615/608/622 are
|
||||
Floors 5/6 → DEFER with the rest of 5/6*; do `salon` only this round.
|
||||
- **Roaming for voice continuity:** confirm 802.11k/v on CSCNet (r optional — test; some phones dislike 802.11r).
|
||||
Keeps calls alive when staff walk between APs.
|
||||
- **min-RSSI tuning:** only tighten where sticky-client far-AP behavior is proven; too aggressive blocks association.
|
||||
|
||||
### PHASE 5 — Physical (separate visit, not tonight — but it caps results)
|
||||
- Re-terminate/replace the ~25 cables on ports stuck at 100 M (limits those APs' uplink throughput).
|
||||
- Chase the 3 offline switches (2nd Floor #2, 4th Floor #2, USW Pro Max 16); finish AP 108 cable run.
|
||||
- p38 (1st Floor USW) 4% tx-drop after the above.
|
||||
|
||||
---
|
||||
|
||||
## 4. Interdependency map (read before changing anything)
|
||||
- **6 GHz BEFORE 5 GHz 40 MHz** — else 5 GHz congestion just relocates. (Phase 2a before 3a.)
|
||||
- **2.4 power MEDIUM not LOW** — Low already over-thinned; going lower starves edge clients. (Phase 2b.)
|
||||
- **AP-lock needs AP capacity** — Lauren locked to 103 ⇒ 103 must be relieved (Phase 3c) or she trades mesh for congestion.
|
||||
- **QoS is independent** — do it first; it can't hurt RF and guarantees a voice win even before RF settles. (Phase 1.)
|
||||
- **Disables + power-down compound** — never do both aggressively in the same area; we already saw the satisfaction hit.
|
||||
- **min-RSSI + power interact** — raising min-RSSI while lowering power can orphan clients; tune one lever at a time.
|
||||
- **Mesh-protected APs** (`2nd Floor Atrium, CC Bridge, salon, 206 U7 Pro, 108`) — never disable; power changes only with watch.
|
||||
|
||||
## 5. Data-driven decision framework — improve quality for ALL devices (measure → decide → adjust)
|
||||
|
||||
**Principle (Howard 2026-06-18): every choice is made FROM measured network data, not assumptions.** Each
|
||||
change is a hypothesis; we gate it on fleet-wide metrics before keeping it or moving on. The goal is *all*
|
||||
devices (CSCNet 427 + CSC ENT 131 + Guest 13), not just the 31 voice phones.
|
||||
|
||||
### 5.1 How each change affects the OTHER (non-voice) devices
|
||||
Almost every change targets the **shared RF environment**, so it helps everyone — voice is just the most
|
||||
sensitive canary:
|
||||
| Change | Effect on non-voice devices |
|
||||
|---|---|
|
||||
| QoS (voice VLAN priority) | **Negligible** — voice is ~3 Mbps of 522; normally zero effect; ACK queue can make others snappier under load |
|
||||
| Enable 6 GHz on CSCNet | **Positive** — 6E devices move to clean 6 GHz → faster for them + clears 5 GHz for everyone left |
|
||||
| 2.4 Low→Medium power | **Positive for ALL 2.4 devices** — undoes the over-thinning regression (IoT/printers/2.4 DirecTV get signal back) |
|
||||
| 5 GHz 80→40 MHz | **Net positive (reliability), small peak-speed cost** — density win; lone heavy transfer sees lower peak |
|
||||
| AP 103 relief | **Positive for all 16 clients on 103**, not just Lauren |
|
||||
| 2.4 1/6/11 channel plan | **Positive for all 2.4 devices** (less co-channel) |
|
||||
| Phone-side (Vertical) | Phones only — **no effect on others** |
|
||||
|
||||
### 5.2 Trade-offs to WATCH in the data (don't help voice, hurt others)
|
||||
1. **Non-DFS 5 GHz + 5 GHz-only devices** — the DirecTV fleet + older laptops **can't use 6 GHz**, so they
|
||||
stay on the fewer non-DFS channels. 6 GHz offloading the newer devices is what keeps this OK; **watch
|
||||
non-DFS 5 GHz cu_total** — if it climbs, that's the signal to rebalance.
|
||||
2. **min-RSSI** affects every client on the AP — too aggressive orphans weak IoT/resident devices. Tune gently.
|
||||
3. **40 MHz** trades single-user peak for fleet reliability — right in density, but it is a trade.
|
||||
|
||||
### 5.3 Fleet-wide metrics we pull at every gate (the data we decide on)
|
||||
Same time-of-day comparison (load varies hourly). Capture before each change and ~15 min after:
|
||||
- `live-stats.sh cascades` → per-band **avg retry%, cu_total, cu_interf, satisfaction (min/median), client counts**
|
||||
- `radio-usage.sh cascades <band>` → per-AP outliers (saturated/high-retry APs)
|
||||
- `/stat/sta` band split (2.4 / 5 / 6 distribution) + count of clients retry>15% (by band) + satisfaction<70 count
|
||||
- Per-AP: any AP whose client count drops toward ~0 in a covered area (= coverage hole)
|
||||
|
||||
### 5.4 The GATE decision rule (per change, per zone)
|
||||
- **KEEP + proceed** only if: the target metric improved **AND** fleet-wide **satisfaction did not fall**,
|
||||
**retry% did not rise**, band split moved the intended way, and **no AP lost its clients** (no hole).
|
||||
- **HOLD** (stop, don't expand) if: target improved but a secondary metric regressed → investigate before more.
|
||||
- **ROLLBACK that step** if: fleet-wide retry up / satisfaction down / a coverage hole / user complaint.
|
||||
- Do **one lever per zone at a time** so cause/effect is attributable (the over-thinning happened because power-down
|
||||
+ disables were stacked).
|
||||
|
||||
### 5.5 Rollback
|
||||
Every `apply-radio`/`apply-wlan` writes a rollback JSON to `.claude/tmp/`; `device-control poe-cycle` for a hung
|
||||
AP (NOT force-provision). Power-up / width-80 / re-enable / channel-revert are all safe reversals.
|
||||
|
||||
## 6. Out of scope tonight (explicit)
|
||||
- **Floors 5 & 6 (MemCare)** — all RF + the MemCare voice phones (`.217/.218/.219/.220`) DEFERRED per Howard.
|
||||
- Physical cabling / offline switches (Phase 5 — separate visit).
|
||||
- The 6 straggler phones — Howard re-keying separately; they'll benefit from the RF work regardless.
|
||||
|
||||
## 7. Open decisions for Howard
|
||||
1. ~~**5 GHz channel plan:** clean-DFS vs non-DFS-only~~ — **RESOLVED 2026-06-18: NON-DFS ONLY** (UNII-1 36–48 + UNII-3 149–165). DFS sweep was clean but only a ~1-day window near Davis-Monthan/TUS; a radar vacate = dropped calls, so resilience wins. 6 GHz covers the capacity gap. (See Phase 3b.)
|
||||
2. **QoS depth:** UniFi WMM + DSCP-honor only, or also a pfSense WAN priority queue/limiter for RTP? Recommendation: both (additive).
|
||||
3. **802.11r** on CSCNet: enable for seamless voice roaming, or k/v only (safer for mixed phones)? Recommendation: k/v now, test r on one phone first.
|
||||
4. Tonight's stopping point: Phases 1–2 alone are a legitimate, lower-risk night; 3–4 can be a second night.
|
||||
5. ~~**Dedicated voice SSID?**~~ — **RESOLVED 2026-06-18: NO — voice stays on the shared CSCNet PPSK.** UniFi
|
||||
3-SSID cap (sound RF hygiene — each SSID = beacon airtime overhead at 77 APs). The only retirement candidate,
|
||||
CSC ENT, still has **131 active clients** (staff PCs, printers, DirecTV fleet) → not retireable soon. And it's
|
||||
not needed: **QoS is VLAN/DSCP-based (SSID-independent)**, band preference is best done **phone-side** (Vertical),
|
||||
and roaming/power-save are phone+AP settings — all work on the shared SSID. A dedicated voice SSID would only
|
||||
add voice-specific WiFi *policy* (per-SSID DTIM/min-RSSI/airtime), a marginal gain not worth a slot. Revisit only
|
||||
if/when CSC ENT's 131 clients migrate off it.
|
||||
111
clients/cascades-tucson/docs/network/phase1-voice-qos-design.md
Normal file
111
clients/cascades-tucson/docs/network/phase1-voice-qos-design.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# Cascades — Phase 1: Voice QoS Design (VLAN 30)
|
||||
|
||||
- **Created:** 2026-06-18 (Howard-Home / claude-main). Part of `network-optimization-master-plan.md` Phase 1.
|
||||
- **Status:** DESIGN — for review, then build (Howard drives pfSense GUI). Nothing applied.
|
||||
- **Risk:** LOW — additive, voice-only prioritization; rollback = disable the shaper. Main caution: size the
|
||||
shaper bandwidth correctly (a wrong value can throttle throughput) → test before/after.
|
||||
|
||||
## Objective
|
||||
Guarantee voice quality under load by prioritizing VLAN 30 traffic end-to-end. **The phones register to a
|
||||
CLOUD PBX (Vertical) over the internet**, so the bottleneck that breaks calls is **WAN upload saturation**
|
||||
(someone uploading / cloud backup / OneDrive sync fills the uplink → voice RTP queues → jitter, dropped
|
||||
audio). QoS keeps voice ahead of bulk data on the WAN.
|
||||
|
||||
## The big advantage of the VLAN move
|
||||
**All voice is now one subnet: `10.0.30.0/24`.** So QoS can match *all* voice by **source subnet** — no
|
||||
need to guess SIP/RTP port ranges per PBX. This is the cleanest, most robust match criterion and it only
|
||||
became possible because we isolated voice onto VLAN 30.
|
||||
|
||||
## Current state (verified 2026-06-18)
|
||||
- **No traffic shaper / limiter configured** on pfSense (clean build).
|
||||
- **Dual-WAN:** WAN1 `igc0` (Cox Fiber, primary, 1G link), WAN2 `igc3` (Cox Coax, 2.5G link); `WAN_Group`
|
||||
failover (`downlosslatency`). Shaping must be applied on **both** WAN interfaces.
|
||||
- pfSense Plus 25.07 (ALTQ shaper + dummynet limiters available).
|
||||
- **Phones mark DSCP EF — CONFIRMED (Howard 2026-06-18).** So we can rely on DSCP for WMM (Layer 2) + switch
|
||||
QoS (Layer 3); the `10.0.30.0/24` subnet match (Layer 1) is the safety net. **No pfSense set-DSCP rule needed.**
|
||||
|
||||
## Measured WAN bandwidth (2026-06-18) — REFRAMES QoS priority
|
||||
- **WAN1 (fiber, primary): upload ~522 Mbps** (Cloudflare single-stream from pfSense). RRD 3-day peaks:
|
||||
**680 Mbps down / 98 Mbps up** (actual usage).
|
||||
- **WAN2 (coax): not measurable remotely** (source-route bind to `72.211.21.217` failed; needs a WAN2-routed
|
||||
host or the Cox bill). Coax is typically asymmetric ~20–50 Mbps up — **size its shaper conservatively**.
|
||||
- **Implication:** 30 calls ≈ ~3 Mbps. WAN1 upload (~522 Mbps) vs peak usage (98 Mbps) = **huge headroom →
|
||||
the WAN is NOT the everyday voice bottleneck.** Everyday dropped-calls = **RF** (Phases 2–4 of the master
|
||||
plan). **QoS here is INSURANCE, not the day-to-day fix** — it earns its keep in two cases: (1) **WAN2
|
||||
failover** (small coax upload + a big upload → real congestion), (2) **rare WAN1 saturation** (backup /
|
||||
large upload; you do hit 680 Mbps down). Build it (cheap, correct), but set expectations: RF is the substance.
|
||||
|
||||
## Three layers (priority order; Layer 1 = insurance, see reframe above)
|
||||
|
||||
### Layer 1 — pfSense WAN shaper (PRIMARY — this is where calls break)
|
||||
**Type: HFSC** (hierarchical, lets us guarantee voice a floor while letting it borrow idle bandwidth).
|
||||
Per WAN interface, three queues:
|
||||
| Queue | Role | HFSC settings (starting point) |
|
||||
|---|---|---|
|
||||
| `qVoice` | voice (VLAN 30 / DSCP EF) | **priority 7**, realtime ~30% of WAN-up, link-share 30%, NOT default |
|
||||
| `qACK` | TCP ACKs (keeps downloads snappy) | priority 6, ~10% |
|
||||
| `qDefault` | everything else | **default**, link-share ~60% |
|
||||
|
||||
**Match rule (floating, WAN, direction out):** source `10.0.30.0/24` → `qVoice`. (Optionally also match
|
||||
DSCP EF if phones mark it — see Layer 4.) One floating rule per WAN, or interface = WAN_Group.
|
||||
|
||||
**Download side:** RTP from the PBX *to* the phones is shaped on the **LAN-side** queues. The wizard builds
|
||||
both directions; if hand-building, mirror a `qVoice` on the internal interfaces too. Upload is the more
|
||||
critical direction for cloud-PBX voice, but do both.
|
||||
|
||||
**Build path (GUI — Howard drives):**
|
||||
- Easiest: **Firewall → Traffic Shaper → Wizard → "Multiple Lan/Wan"** — set #WAN=2, #LAN as needed,
|
||||
enter each WAN's bandwidth (below), on the VoIP page choose **"prioritize by address" = `10.0.30.0/24`**
|
||||
with a guaranteed %; the wizard generates HFSC queues + the float rules. Then tune.
|
||||
- Or manual: Firewall → Traffic Shaper → By Interface → add HFSC on WAN1 + WAN2, create the 3 queues,
|
||||
then Firewall → Rules → Floating → match `10.0.30.0/24` out → Ackqueue/Queue = qACK/qVoice.
|
||||
|
||||
> **Sizing inputs:** WAN1 upload **~522 Mbps (measured 2026-06-18)** → shape `qVoice`'s parent to ~480–500
|
||||
> Mbps. **WAN2 (coax) upload still UNKNOWN** (remote source-route test failed) — get from the Cox bill or a
|
||||
> speedtest from a host routed via WAN2; size conservatively (assume ~35 Mbps up until measured). Shaping to
|
||||
> ~90–95% of actual upload keeps the queue in pfSense (where we control priority), not at the ISP. WAN2 is the
|
||||
> one that actually constrains voice (on failover), so its number matters most.
|
||||
|
||||
### Layer 2 — UniFi WMM (the WiFi phones — Poly)
|
||||
Over the air, **WMM** maps DSCP → WiFi access categories; voice (DSCP EF/46) → **WMM Voice AC** (gets TXOP
|
||||
priority over data). WMM is ON by default on UniFi — **verify it's enabled on CSCNet** and that the U7 APs
|
||||
honor DSCP→WMM. This is what protects the 22 Poly phones over the air during WiFi congestion. (Ties into the
|
||||
RF work — a clean 5/6 GHz + WMM = good wireless voice.)
|
||||
|
||||
### Layer 3 — UniFi switch QoS (the wired AudioCodes)
|
||||
UniFi switches honor 802.1p/DSCP and queue tagged voice to a high-priority egress queue — mostly automatic
|
||||
once the phones mark DSCP. LAN links are gig and rarely congested, so this is the least critical layer, but
|
||||
confirm the USW isn't stripping DSCP and that voice VLAN 30 frames get the priority queue.
|
||||
|
||||
### Layer 4 — DSCP marking (make the above reliable)
|
||||
- **Verify the phones mark voice:** AudioCodes + Poly typically tag RTP **EF (46)** and signaling **CS3 (24)**
|
||||
by default, often set via the PBX/provisioning. Confirm with Vertical (Richard) or capture a packet.
|
||||
- **If they DON'T mark (or inconsistently):** add a pfSense floating rule that **SETS DSCP EF** on
|
||||
`10.0.30.0/24` traffic (Advanced → "Match/Set DSCP"). Then Layer 1/2/3 can all match on EF too.
|
||||
- **Match-by-subnet (Layer 1) works regardless of DSCP** — it's the safety net. DSCP makes WMM (Layer 2)
|
||||
and switch QoS (Layer 3) automatic.
|
||||
|
||||
## Implementation order
|
||||
1. Get the Cox WAN upload numbers (blocker for Layer 1 sizing).
|
||||
2. Confirm phones mark DSCP EF (Vertical) — decides whether we add the pfSense set-DSCP rule.
|
||||
3. Build Layer 1 (pfSense HFSC + float rule) — dry-run mindset: set it, then validate.
|
||||
4. Verify Layer 2 (WMM on CSCNet) + Layer 3 (switch honoring DSCP).
|
||||
5. Validate (below). Tune `qVoice` % if needed.
|
||||
|
||||
## Validation (prove it works)
|
||||
- **Baseline:** from a LAN host, saturate the WAN upload (big upload / `iperf3 -u` / speedtest) WHILE on a
|
||||
call from a voice phone — note the breakup *without* QoS.
|
||||
- **After:** repeat the same saturation; call stays clean. Check Firewall → Traffic Shaper → Queues: `qVoice`
|
||||
carrying voice with ~0 drops while `qDefault` absorbs the saturation + drops.
|
||||
- Confirm both WANs (test on primary; fail to WAN2 and re-test).
|
||||
|
||||
## Rollback
|
||||
Firewall → Traffic Shaper → disable/remove the shaper; delete the floating rule. Zero residual effect
|
||||
(QoS only orders packets under congestion; removing it reverts to FIFO). The set-DSCP rule (if added) can stay
|
||||
or go independently.
|
||||
|
||||
## Notes / interplay with the rest of the plan
|
||||
- QoS is **independent of the RF work** — it helps wired + WiFi voice immediately and can be built tonight
|
||||
regardless of the 2.4/5/6 GHz changes.
|
||||
- It does NOT fix RF problems (a phone on a 50%-retry 2.4 GHz radio still suffers) — QoS handles *congestion/
|
||||
contention for bandwidth*, RF tuning handles *the air*. Both are needed; they're complementary.
|
||||
Reference in New Issue
Block a user