cascades: recover 4 docs dropped by the history-rewrite/repo-split
The 2026-06-18 repo restructure (history rewrite + project->submodule split) dropped these 4 Cascades files from the new clone. Copied byte-identical from the pre-cutover claudetools.old clone (md5-verified): - docs/network/network-optimization-master-plan.md - docs/network/phase1-voice-qos-design.md - reports/2026-06-18-voice-quality-diagnostic.md - session-logs/2026-06/2026-06-18-howard-cascades-rf-voice-optimization-plan.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,135 @@
|
||||
# Cascades — voice-quality diagnostic + holistic RF/QoS optimization master plan
|
||||
|
||||
## User
|
||||
- **User:** Howard Enos (howard)
|
||||
- **Machine:** Howard-Home
|
||||
- **Role:** tech
|
||||
|
||||
> NOTE: written in the OLD clone after the 2026-06-19 claudetools restructure coord message arrived.
|
||||
> NOT synced from here. Recover into the fresh re-clone (see Pending tasks).
|
||||
|
||||
## Session Summary
|
||||
|
||||
Continuation of the VLAN 30 voice cutover. First, completed the AudioCodes migration: the 8 wired
|
||||
AudioCodes would not pick up VLAN 30 addresses via port re-VLAN + UniFi PoE power-cycle (PoE is OFF on
|
||||
those ports — they run on external power bricks, so a UniFi power-cycle is a no-op; a UI port disable/enable
|
||||
didn't reset their uptime either). Root cause confirmed: they held their old main-LAN DHCP leases and never
|
||||
re-DHCP'd. Howard fully powered them off/on, after which all 8 pulled VOICE leases (10.0.30.224-231). Final
|
||||
state: 31 devices on VOICE (8 AudioCodes + 22 Poly + Vertical desktop).
|
||||
|
||||
Second, diagnosed voice quality (dropped calls / voice breaks). Wired AudioCodes: all 8 ports clean (100M
|
||||
full-duplex, zero errors). The problem is RF on the WiFi Poly phones: 14 flagged, worst = Lauren/.202 (2.4
|
||||
GHz, 50% retry, on the CC Bridge wireless-MESH AP) and Shelby/.218 (2.4 GHz, 53% retry, MemCare). Coverage
|
||||
gaps in rooms 515/210/204 (RSSI -72 to -82). AP 103 5 GHz saturated (75% airtime, ~25,900 retries). Also
|
||||
found 6 Poly phones NOT migrated (still on VLAN 20/Default) — fleet is 28 Poly, not 22; verified they are
|
||||
distinct active MACs, not ghosts. Howard locked Lauren's phone to AP 103 (off the mesh AP).
|
||||
|
||||
Third, built a holistic, all-device network optimization master plan grounded in the existing 2026-06-16
|
||||
audit + 2.4 GHz runbook + the over-thinning re-check. Key current-state fact: the network is OVER-THINNED on
|
||||
2.4 GHz (overnight 6/17: 24 radios disabled + 42 at Low/6 dBm -> interference down but retry 17->23%,
|
||||
satisfaction 39->30). The plan's central principle: open relief valves (6 GHz + correct 2.4 power) BEFORE
|
||||
constraining (5 GHz 40 MHz), to avoid relocating congestion. Sequenced phases: (1) QoS, (2a) enable 6 GHz on
|
||||
CSCNet + 2b correct 2.4 Low->Medium, (3) 5 GHz 80->40 MHz + non-DFS channel plan + relieve AP 103, (4)
|
||||
fine-tune, (5) physical. With an interdependency map and per-phase gates.
|
||||
|
||||
Fourth, verified DFS rigorously (Howard's concern re: Davis-Monthan AFB + TUS ~10 mi). The skill's dfs-check
|
||||
flagged 3 APs, but on inspection all were benign (CAC timers + DFS-control toggles on a non-DFS channel, not
|
||||
radar). A precise radar-detection-only sweep found ZERO genuine hits across all 53 DFS APs — but only over a
|
||||
~21-23h window (APs rebooted in the 6/17 outage). DECISION: go NON-DFS only (UNII-1 36-48 + UNII-3 149-165) —
|
||||
a radar vacate = dropped calls; resilience > diversity; 6 GHz covers the capacity gap.
|
||||
|
||||
Fifth, designed Phase 1 QoS (pfSense + UniFi). Measured WAN: WAN1 fiber upload ~522 Mbps (vs ~98 Mbps peak
|
||||
usage) -> the WAN is NOT the everyday voice bottleneck, so QoS is INSURANCE (WAN2 coax failover + rare
|
||||
saturation), not the everyday fix — RF is the substance. Match voice by source subnet 10.0.30.0/24 (the VLAN
|
||||
move's payoff). Phones confirmed to support DSCP EF. Determined a dedicated voice SSID is NOT viable (UniFi
|
||||
3-SSID cap; CSC ENT still has 131 clients, not retireable) and NOT needed (QoS is VLAN/DSCP-based,
|
||||
SSID-independent; band preference is phone-side). Added an all-devices impact + data-driven decision
|
||||
framework: every change gated on fleet-wide metrics (measure -> decide -> adjust), with the trade-offs to
|
||||
watch (non-DFS + 5GHz-only DirecTV fleet; min-RSSI orphaning; 40 MHz peak).
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- **AudioCodes need a full power-cycle (off/on), not a UniFi PoE cycle** — they're externally powered (PoE off
|
||||
on the ports); a UI port bounce doesn't reset them.
|
||||
- **5 GHz: NON-DFS ONLY.** DFS sweep clean but only ~1-day window near a military base/airport; a radar vacate
|
||||
drops calls. Resilience over channel diversity; lean on 6 GHz for capacity.
|
||||
- **QoS reframed to INSURANCE, not the everyday fix** — WAN1 fiber has ~522 Mbps up vs 98 Mbps peak use; the
|
||||
everyday dropped-calls cause is RF. QoS matters on WAN2 (coax) failover + rare WAN1 saturation.
|
||||
- **No dedicated voice SSID** — 3-SSID cap is sound RF hygiene; CSC ENT (the only retirement candidate) still
|
||||
has 131 clients; and voice doesn't need a dedicated SSID (QoS is SSID-independent, band-pref is phone-side).
|
||||
- **Open relief valves before constraining** — 6 GHz + 2.4 Low->Medium BEFORE 5 GHz 40 MHz, or congestion just
|
||||
relocates.
|
||||
- **2.4 power Low->Medium, not lower** — Low already over-thinned (retry up, satisfaction down).
|
||||
- **Data-driven gates** (Howard) — base every choice on measured fleet-wide metrics; one lever per zone;
|
||||
keep/hold/rollback per the gate rule; validation measures ALL devices, not just voice.
|
||||
- **Phones are 5 GHz (not 6E)** — 6 GHz helps voice indirectly by clearing 5 GHz of resident devices.
|
||||
|
||||
## Problems Encountered
|
||||
|
||||
- **AudioCodes wouldn't move to VLAN 30** — root-caused to held DHCP leases + PoE-off ports (power-cycle no-op);
|
||||
resolved by full power-off/on.
|
||||
- **UniFi controller PUT 403s** — CSRF token extraction flaky; fixed by reading `x-updated-csrf-token` (with a
|
||||
TOKEN-cookie JWT fallback).
|
||||
- **pfSense SSH rate-limiting + controller throttling** after many rapid queries — switched between controller
|
||||
API and pfSense SSH as needed; one fleet pull hung in the background.
|
||||
- **Temp-file/sync friction (RECURRED 3x)** — controller-scratch files (.sta.json, .fleet325.dev) written
|
||||
CWD-relative got swept into commits by `git add -A` and blocked rebases (stray locked curl.exe held them).
|
||||
Fixed: killed the procs, untracked, broadened .gitignore (.fleet*, .ap[0-9]*, .vq[0-9]*, .q[0-9]*). Real fix:
|
||||
write API scratch OUTSIDE the repo (used mktemp -d for the DFS sweep).
|
||||
- **Cloudflare __down / WAN2-bound speedtests returned 0.0** — only WAN1 upload (522 Mbps) measured cleanly;
|
||||
WAN2 (coax) upload still unknown (needs a WAN2-routed host or Cox bill).
|
||||
|
||||
## Configuration Changes (all in clients/cascades-tucson/, committed up through 2b2d094 BEFORE the restructure)
|
||||
|
||||
- **Created** `docs/network/network-optimization-master-plan.md` — holistic all-device plan (sequencing,
|
||||
interdependency map, data-driven decision framework, DFS non-DFS decision, SSID decision).
|
||||
- **Created** `docs/network/phase1-voice-qos-design.md` — pfSense HFSC + UniFi WMM/switch QoS design.
|
||||
- **Created** `reports/2026-06-18-voice-quality-diagnostic.md` — per-phone RF findings + fixes.
|
||||
- **Updated** `reports/2026-06-16-voice-quality-diagnostic.md`? (no) — voice-quality report Lauren->103 note.
|
||||
- **No live network changes applied** (Cascades rule: explicit per-change go). UniFi port bounces were
|
||||
temporary (restored). DFS/WAN tests were read-only/bounded.
|
||||
|
||||
## Credentials & Secrets
|
||||
- No new credentials. Used existing: `infrastructure/uos-server-network-api-rw` (controller),
|
||||
`clients/cascades-tucson/unifi-ap-ssh` (AP SSH for DFS sweep), `clients/cascades-tucson/pfsense-firewall`,
|
||||
`clients/cascades-tucson/wifi-voice-ppsk` (key `V0!c38863171`).
|
||||
|
||||
## Infrastructure & Servers
|
||||
- VOICE VLAN 30 `10.0.30.0/24`: 8 AudioCodes `.224-.231`, 22 Poly `.202-.223`, desktop `.201`.
|
||||
- WAN1 fiber igc0 (522 Mbps up measured; RRD peaks 680 down/98 up). WAN2 coax igc3 (72.211.21.217, upload
|
||||
unmeasured). pfSense `192.168.0.1` Plus 25.07, no existing shaper.
|
||||
- UniFi UOS `172.16.3.29:11443` site `va6iba3v`. USW-16-PoE mac `d8:b3:70:21:94:5f` dev_id `685f39078e65331c46ef7e90`.
|
||||
- SSIDs: CSCNet (427 clients, PPSK, 2g+5g), CSC ENT (131 clients, legacy, 2g+5g), Guest (13, 2g+5g+6g).
|
||||
- DFS: 53 APs on DFS, 0 genuine radar over ~21-23h.
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
|
||||
- **RE-CLONE claudetools** (coord message 2026-06-19): old clone incompatible after history rewrite. Steps below.
|
||||
- **Verify** this session's Cascades docs (master plan, QoS design, voice-quality + diagnostic reports, voice
|
||||
inventory, logging plan) survived the rewrite into the new repo; if missing, recover from this .old working tree.
|
||||
- **Recover this session log** into the new clone (it's uncommitted here).
|
||||
- WAN2 (coax) upload number — measure from a WAN2-routed host / Cox bill (sizes the failover shaper).
|
||||
- 6 straggler Poly phones (10.0.20.64/65/66/67/195, 192.168.1.126) — re-key to voice PPSK.
|
||||
- Floors 5/6 (MemCare) RF + phones — deferred.
|
||||
- Execute the optimization plan (start Phase 2b 2.4 Low->Medium with baseline capture) — pending Howard's go.
|
||||
- Hand Vertical the phone-side config list (band 5GHz lock, DSCP-on, k/v roaming, U-APSD, firmware).
|
||||
|
||||
## Re-clone steps (Windows / C:\claudetools)
|
||||
```
|
||||
# from C:\
|
||||
mv claudetools claudetools.old # or rename in Explorer
|
||||
git clone https://git.azcomputerguru.com/azcomputerguru/claudetools.git claudetools
|
||||
cp claudetools.old/.claude/identity.json claudetools/.claude/
|
||||
cd claudetools && git submodule update --init --recursive
|
||||
# then: diff clients/cascades-tucson against claudetools.old; cp any missing files (esp. this session's docs)
|
||||
# recover this session log from claudetools.old/clients/cascades-tucson/session-logs/2026-06/
|
||||
# verify, then delete claudetools.old
|
||||
```
|
||||
See RECLONE.md in the new repo. Pre-split backup bundle: Jupiter share Backups/Gitea-Storage.
|
||||
|
||||
## Reference Information
|
||||
- Last pushed commit (old history): `2b2d094` (2026-06-18 19:16). Restructure force-push: ~2026-06-19 02:41 UTC.
|
||||
- Master plan: `clients/cascades-tucson/docs/network/network-optimization-master-plan.md`
|
||||
- QoS design: `clients/cascades-tucson/docs/network/phase1-voice-qos-design.md`
|
||||
- Voice-quality diagnostic: `clients/cascades-tucson/reports/2026-06-18-voice-quality-diagnostic.md`
|
||||
- Existing RF audit + 2.4 runbook: `reports/2026-06-16-unifi-full-audit.md`, `reports/2026-06-16-2.4ghz-remediation-runbook.md`
|
||||
Reference in New Issue
Block a user