cascades: recover 4 docs dropped by the history-rewrite/repo-split

The 2026-06-18 repo restructure (history rewrite + project->submodule split)
dropped these 4 Cascades files from the new clone. Copied byte-identical from
the pre-cutover claudetools.old clone (md5-verified):
- docs/network/network-optimization-master-plan.md
- docs/network/phase1-voice-qos-design.md
- reports/2026-06-18-voice-quality-diagnostic.md
- session-logs/2026-06/2026-06-18-howard-cascades-rf-voice-optimization-plan.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-18 20:21:27 -07:00
parent b66b80a95b
commit c2e5f4faeb
4 changed files with 514 additions and 0 deletions

View File

@@ -0,0 +1,135 @@
# Cascades — voice-quality diagnostic + holistic RF/QoS optimization master plan
## User
- **User:** Howard Enos (howard)
- **Machine:** Howard-Home
- **Role:** tech
> NOTE: written in the OLD clone after the 2026-06-19 claudetools restructure coord message arrived.
> NOT synced from here. Recover into the fresh re-clone (see Pending tasks).
## Session Summary
Continuation of the VLAN 30 voice cutover. First, completed the AudioCodes migration: the 8 wired
AudioCodes would not pick up VLAN 30 addresses via port re-VLAN + UniFi PoE power-cycle (PoE is OFF on
those ports — they run on external power bricks, so a UniFi power-cycle is a no-op; a UI port disable/enable
didn't reset their uptime either). Root cause confirmed: they held their old main-LAN DHCP leases and never
re-DHCP'd. Howard fully powered them off/on, after which all 8 pulled VOICE leases (10.0.30.224-231). Final
state: 31 devices on VOICE (8 AudioCodes + 22 Poly + Vertical desktop).
Second, diagnosed voice quality (dropped calls / voice breaks). Wired AudioCodes: all 8 ports clean (100M
full-duplex, zero errors). The problem is RF on the WiFi Poly phones: 14 flagged, worst = Lauren/.202 (2.4
GHz, 50% retry, on the CC Bridge wireless-MESH AP) and Shelby/.218 (2.4 GHz, 53% retry, MemCare). Coverage
gaps in rooms 515/210/204 (RSSI -72 to -82). AP 103 5 GHz saturated (75% airtime, ~25,900 retries). Also
found 6 Poly phones NOT migrated (still on VLAN 20/Default) — fleet is 28 Poly, not 22; verified they are
distinct active MACs, not ghosts. Howard locked Lauren's phone to AP 103 (off the mesh AP).
Third, built a holistic, all-device network optimization master plan grounded in the existing 2026-06-16
audit + 2.4 GHz runbook + the over-thinning re-check. Key current-state fact: the network is OVER-THINNED on
2.4 GHz (overnight 6/17: 24 radios disabled + 42 at Low/6 dBm -> interference down but retry 17->23%,
satisfaction 39->30). The plan's central principle: open relief valves (6 GHz + correct 2.4 power) BEFORE
constraining (5 GHz 40 MHz), to avoid relocating congestion. Sequenced phases: (1) QoS, (2a) enable 6 GHz on
CSCNet + 2b correct 2.4 Low->Medium, (3) 5 GHz 80->40 MHz + non-DFS channel plan + relieve AP 103, (4)
fine-tune, (5) physical. With an interdependency map and per-phase gates.
Fourth, verified DFS rigorously (Howard's concern re: Davis-Monthan AFB + TUS ~10 mi). The skill's dfs-check
flagged 3 APs, but on inspection all were benign (CAC timers + DFS-control toggles on a non-DFS channel, not
radar). A precise radar-detection-only sweep found ZERO genuine hits across all 53 DFS APs — but only over a
~21-23h window (APs rebooted in the 6/17 outage). DECISION: go NON-DFS only (UNII-1 36-48 + UNII-3 149-165) —
a radar vacate = dropped calls; resilience > diversity; 6 GHz covers the capacity gap.
Fifth, designed Phase 1 QoS (pfSense + UniFi). Measured WAN: WAN1 fiber upload ~522 Mbps (vs ~98 Mbps peak
usage) -> the WAN is NOT the everyday voice bottleneck, so QoS is INSURANCE (WAN2 coax failover + rare
saturation), not the everyday fix — RF is the substance. Match voice by source subnet 10.0.30.0/24 (the VLAN
move's payoff). Phones confirmed to support DSCP EF. Determined a dedicated voice SSID is NOT viable (UniFi
3-SSID cap; CSC ENT still has 131 clients, not retireable) and NOT needed (QoS is VLAN/DSCP-based,
SSID-independent; band preference is phone-side). Added an all-devices impact + data-driven decision
framework: every change gated on fleet-wide metrics (measure -> decide -> adjust), with the trade-offs to
watch (non-DFS + 5GHz-only DirecTV fleet; min-RSSI orphaning; 40 MHz peak).
## Key Decisions
- **AudioCodes need a full power-cycle (off/on), not a UniFi PoE cycle** — they're externally powered (PoE off
on the ports); a UI port bounce doesn't reset them.
- **5 GHz: NON-DFS ONLY.** DFS sweep clean but only ~1-day window near a military base/airport; a radar vacate
drops calls. Resilience over channel diversity; lean on 6 GHz for capacity.
- **QoS reframed to INSURANCE, not the everyday fix** — WAN1 fiber has ~522 Mbps up vs 98 Mbps peak use; the
everyday dropped-calls cause is RF. QoS matters on WAN2 (coax) failover + rare WAN1 saturation.
- **No dedicated voice SSID** — 3-SSID cap is sound RF hygiene; CSC ENT (the only retirement candidate) still
has 131 clients; and voice doesn't need a dedicated SSID (QoS is SSID-independent, band-pref is phone-side).
- **Open relief valves before constraining** — 6 GHz + 2.4 Low->Medium BEFORE 5 GHz 40 MHz, or congestion just
relocates.
- **2.4 power Low->Medium, not lower** — Low already over-thinned (retry up, satisfaction down).
- **Data-driven gates** (Howard) — base every choice on measured fleet-wide metrics; one lever per zone;
keep/hold/rollback per the gate rule; validation measures ALL devices, not just voice.
- **Phones are 5 GHz (not 6E)** — 6 GHz helps voice indirectly by clearing 5 GHz of resident devices.
## Problems Encountered
- **AudioCodes wouldn't move to VLAN 30** — root-caused to held DHCP leases + PoE-off ports (power-cycle no-op);
resolved by full power-off/on.
- **UniFi controller PUT 403s** — CSRF token extraction flaky; fixed by reading `x-updated-csrf-token` (with a
TOKEN-cookie JWT fallback).
- **pfSense SSH rate-limiting + controller throttling** after many rapid queries — switched between controller
API and pfSense SSH as needed; one fleet pull hung in the background.
- **Temp-file/sync friction (RECURRED 3x)** — controller-scratch files (.sta.json, .fleet325.dev) written
CWD-relative got swept into commits by `git add -A` and blocked rebases (stray locked curl.exe held them).
Fixed: killed the procs, untracked, broadened .gitignore (.fleet*, .ap[0-9]*, .vq[0-9]*, .q[0-9]*). Real fix:
write API scratch OUTSIDE the repo (used mktemp -d for the DFS sweep).
- **Cloudflare __down / WAN2-bound speedtests returned 0.0** — only WAN1 upload (522 Mbps) measured cleanly;
WAN2 (coax) upload still unknown (needs a WAN2-routed host or Cox bill).
## Configuration Changes (all in clients/cascades-tucson/, committed up through 2b2d094 BEFORE the restructure)
- **Created** `docs/network/network-optimization-master-plan.md` — holistic all-device plan (sequencing,
interdependency map, data-driven decision framework, DFS non-DFS decision, SSID decision).
- **Created** `docs/network/phase1-voice-qos-design.md` — pfSense HFSC + UniFi WMM/switch QoS design.
- **Created** `reports/2026-06-18-voice-quality-diagnostic.md` — per-phone RF findings + fixes.
- **Updated** `reports/2026-06-16-voice-quality-diagnostic.md`? (no) — voice-quality report Lauren->103 note.
- **No live network changes applied** (Cascades rule: explicit per-change go). UniFi port bounces were
temporary (restored). DFS/WAN tests were read-only/bounded.
## Credentials & Secrets
- No new credentials. Used existing: `infrastructure/uos-server-network-api-rw` (controller),
`clients/cascades-tucson/unifi-ap-ssh` (AP SSH for DFS sweep), `clients/cascades-tucson/pfsense-firewall`,
`clients/cascades-tucson/wifi-voice-ppsk` (key `V0!c38863171`).
## Infrastructure & Servers
- VOICE VLAN 30 `10.0.30.0/24`: 8 AudioCodes `.224-.231`, 22 Poly `.202-.223`, desktop `.201`.
- WAN1 fiber igc0 (522 Mbps up measured; RRD peaks 680 down/98 up). WAN2 coax igc3 (72.211.21.217, upload
unmeasured). pfSense `192.168.0.1` Plus 25.07, no existing shaper.
- UniFi UOS `172.16.3.29:11443` site `va6iba3v`. USW-16-PoE mac `d8:b3:70:21:94:5f` dev_id `685f39078e65331c46ef7e90`.
- SSIDs: CSCNet (427 clients, PPSK, 2g+5g), CSC ENT (131 clients, legacy, 2g+5g), Guest (13, 2g+5g+6g).
- DFS: 53 APs on DFS, 0 genuine radar over ~21-23h.
## Pending / Incomplete Tasks
- **RE-CLONE claudetools** (coord message 2026-06-19): old clone incompatible after history rewrite. Steps below.
- **Verify** this session's Cascades docs (master plan, QoS design, voice-quality + diagnostic reports, voice
inventory, logging plan) survived the rewrite into the new repo; if missing, recover from this .old working tree.
- **Recover this session log** into the new clone (it's uncommitted here).
- WAN2 (coax) upload number — measure from a WAN2-routed host / Cox bill (sizes the failover shaper).
- 6 straggler Poly phones (10.0.20.64/65/66/67/195, 192.168.1.126) — re-key to voice PPSK.
- Floors 5/6 (MemCare) RF + phones — deferred.
- Execute the optimization plan (start Phase 2b 2.4 Low->Medium with baseline capture) — pending Howard's go.
- Hand Vertical the phone-side config list (band 5GHz lock, DSCP-on, k/v roaming, U-APSD, firmware).
## Re-clone steps (Windows / C:\claudetools)
```
# from C:\
mv claudetools claudetools.old # or rename in Explorer
git clone https://git.azcomputerguru.com/azcomputerguru/claudetools.git claudetools
cp claudetools.old/.claude/identity.json claudetools/.claude/
cd claudetools && git submodule update --init --recursive
# then: diff clients/cascades-tucson against claudetools.old; cp any missing files (esp. this session's docs)
# recover this session log from claudetools.old/clients/cascades-tucson/session-logs/2026-06/
# verify, then delete claudetools.old
```
See RECLONE.md in the new repo. Pre-split backup bundle: Jupiter share Backups/Gitea-Storage.
## Reference Information
- Last pushed commit (old history): `2b2d094` (2026-06-18 19:16). Restructure force-push: ~2026-06-19 02:41 UTC.
- Master plan: `clients/cascades-tucson/docs/network/network-optimization-master-plan.md`
- QoS design: `clients/cascades-tucson/docs/network/phase1-voice-qos-design.md`
- Voice-quality diagnostic: `clients/cascades-tucson/reports/2026-06-18-voice-quality-diagnostic.md`
- Existing RF audit + 2.4 runbook: `reports/2026-06-16-unifi-full-audit.md`, `reports/2026-06-16-2.4ghz-remediation-runbook.md`