diff --git a/clients/cascades-tucson/session-logs/2026-06/2026-06-18-howard-voice-vlan-migration-logging-plan.md b/clients/cascades-tucson/session-logs/2026-06/2026-06-18-howard-voice-vlan-migration-logging-plan.md new file mode 100644 index 00000000..bb19cfaf --- /dev/null +++ b/clients/cascades-tucson/session-logs/2026-06/2026-06-18-howard-voice-vlan-migration-logging-plan.md @@ -0,0 +1,139 @@ +# Cascades — Voice VLAN 30 live migration (all Poly + desktop) + network-logging plan + +## User +- **User:** Howard Enos (howard) +- **Machine:** Howard-Home +- **Role:** tech + +## Session Summary + +Continuation of the 2026-06-17 VOICE VLAN 30 build (see `2026-06-17-howard-voice-vlan30-build.md`). +This session executed the live device migration onto the isolated VLAN 30 and produced a spec for +network observability. Work spanned the 06-17 -> 06-18 date boundary. + +First, the Vertical-Remote management desktop was moved. Howard set USW-16-PoE port 16 native VLAN +to VOICE, but the desktop kept its old `192.168.2.180` lease. Diagnosis (pfSense + UniFi +controller) showed nothing misconfigured: re-VLANing a wired port does not bounce the NIC link, so +Windows held its old lease and its unicast renewal to the old DHCP server was (correctly) blocked by +the VOICE isolation rules. A UniFi client block/unblock is a MAC filter, not a link bounce, so it +had no effect. Fixed by bouncing port 16 via the controller API (PUT rest/device port_overrides +forward:disabled then restore, preserving ports 1-8) — the desktop re-DHCP'd to 10.0.30.201. + +Second, Howard re-keyed the 22 Poly WiFi phones to the voice PPSK over ~2 hours. As each phone +joined, the controller `/stat/sta` was polled to map the new 10.0.30.x lease to the phone's +location/owner. A WiFi re-auth is itself a fresh DHCP, so the Poly phones needed no bounce. The +first phone (Lauren Hasselman, Accounting Director) was validated end-to-end: dial tone + an +outbound call to a cell phone. All 22 Poly phones plus the desktop (23 devices) ended up on VOICE, +each pulling a clean lease and isolated from PHI/LAN/VLAN20/mgmt. A living inventory doc was created +(`docs/network/voice-phone-inventory.md`) and the wiki Voice-VLAN entry flipped PLANNED -> IN +PROGRESS. + +Third, Howard raised the need for network logging to track devices that drop/get kicked and to +root-cause the ongoing Cascades network issues. Investigation found the UniFi controller is +retaining ZERO client events/alarms for the Cascades site over 7 days, and pfSense logs locally in +tiny circular buffers — i.e., drop/kick history is not being captured at all. A "plan only" spec was +written (`docs/network/network-logging-plan.md`) recommending the Synology cascadesDS (DSM Log +Center syslog server) as the on-site collector (CS-SERVER ruled out as the fragile EOL DC), with +pfSense + UniFi/AP syslog as sources and a 1-2 min client snapshotter to fill the controller's +history gap. + +Finally, a sync hit a rebase conflict because controller-query scratch files written to the repo +CWD (.sta.json etc.) were swept into a commit by `git add -A`, and a stray locked curl.exe held the +file. Killed the process, untracked .sta.json, gitignored the temp patterns, and pushed clean. + +## Key Decisions + +- **Desktop cutover via port bounce, not NIC change.** Confirmed desktop is DHCP; the fix for a + stuck lease after re-VLAN is a link bounce (port disable/enable or PoE power-cycle), not a NIC + reconfig and not a UniFi client block/unblock. +- **Read drop/kick state from the UniFi controller, not pfSense SSH**, after pfSense sshd began + rate-limiting following many rapid SSH calls. Controller API (`/stat/sta`) was the healthy path + and also gives AP/location hints. +- **Track phones by MAC + location, not IP** (leases are dynamic; a phone may renew to a different + 10.0.30.x). +- **Network-logging collector = Synology Log Center, NOT CS-SERVER.** CS-SERVER is the fragile + EOL/degraded-RAID single DC; adding syslog ingestion is unacceptable. pfSense/UniFi are sources, + not retention/search stores. Synology keeps it on-site and off the DC. +- **Plan only for logging** (per Howard) — spec written, build scheduled later. + +## Problems Encountered + +- **Desktop stuck on 192.168.2.180 after port moved to VLAN 30** — stale DHCP lease; renewal + blocked by VOICE isolation. Resolved by bouncing port 16 via controller API -> re-DHCP to + 10.0.30.201. +- **UniFi controller PUT returned HTTP 403** — UniFi OS requires a CSRF token on writes. Resolved by + reading `x-updated-csrf-token` from the login response headers and sending `X-CSRF-Token`. +- **pfSense SSH began failing (exit 255)** while ping still succeeded — sshd rate-limiting after many + rapid `pfsense-ssh.sh` calls. Switched to the UniFi controller API for subsequent reads. +- **Git-Bash `/tmp` path mismatch** — msys `curl -o /tmp/x.json` wrote where Windows python could not + read (FileNotFoundError). Switched to CWD-relative scratch files. +- **Scratch files committed + rebase blocked** — CWD-relative `.sta.json` got swept into a commit by + sync's `git add -A`, and a stray locked `curl.exe` (PID 25252) held the file, blocking the rebase. + Killed the process, `git rm --cached .sta.json`, gitignored `.sta.json`/`.dev.json`/`.q*`/etc., + committed, and pushed. (Lesson: write API scratch OUTSIDE the repo or use the ignored `.tmp-` prefix.) +- **Earlier errorlog rebase conflict** (concurrent GURU-5070 entry) — resolved keeping both entries. + +## Configuration Changes + +- **Created** `clients/cascades-tucson/docs/network/voice-phone-inventory.md` — living inventory, 23 + devices on VOICE (desktop + 22 Poly) with MAC/IP/location. +- **Created** `clients/cascades-tucson/docs/network/network-logging-plan.md` — observability spec + (build later). +- **Updated** `clients/cascades-tucson/docs/network/voice-vlan-cutover.md` — added the + bounce-to-re-DHCP CRITICAL step; fixed stale NIC-change/OpenVPN/reservation references. +- **Updated** `wiki/clients/cascades-tucson.md` — Voice VLAN PLANNED -> IN PROGRESS, two locations. +- **Updated** `.claude/memory/MEMORY.md` + created `project_cascades_isolated_vlan_pattern.md` (prior + session, synced). +- **Updated** `.gitignore` — ignore controller-query scratch patterns; `git rm --cached .sta.json`. +- **pfSense (prior session, this thread):** VOICE rule protocol TCP -> Any via PHP config API. +- **UniFi:** bounced USW-16-PoE port 16 (disable/restore) via controller API — temporary, restored + to exact original (native VOICE, forward:customize). + +## Credentials & Secrets + +- **VOICE PPSK key** `V0!c38863171` — vault `clients/cascades-tucson/wifi-voice-ppsk.sops.yaml` + (created prior session, pushed). Entered on all 22 Poly phones this session. +- UniFi controller RW: vault `infrastructure/uos-server-network-api-rw` (used for reads + the port + bounce). pfSense admin: vault `clients/cascades-tucson/pfsense-firewall`. Synology: vault + `clients/cascades-tucson/synology-cascadesds`. + +## Infrastructure & Servers + +- **VOICE VLAN 30:** `10.0.30.0/24`, gw `10.0.30.1` (pfSense `igc1.30`/opt241), DHCP `.100-.250`, + DNS `8.8.8.8/1.1.1.1`. Isolation = Guest-clone (any-proto quick blocks to 192.168.0.0/22 + + 10.0.0.0/8 + 172.16.0.0/12, then pass any). +- **UniFi VOICE network** id `6a32e0194e709ad31ad161e6` (VLAN Only). USW-16-PoE mac + `d8:b3:70:21:94:5f`, device_id `685f39078e65331c46ef7e90`. UOS `172.16.3.29:11443`, site + `va6iba3v`. +- **Synology cascadesDS** `192.168.0.120` (DSM up on :5001) — proposed logging collector. +- **Jupiter** `172.16.3.20` (Unraid/Docker, hosts UniFi VM) — fallback collector host. + +## Commands & Outputs + +- Controller client poll (mapping phones): `POST /api/auth/login` -> GET + `/proxy/network/api/s/va6iba3v/stat/sta`, filter `network==VOICE or vlan==30`. +- Port bounce: `PUT /proxy/network/api/s/va6iba3v/rest/device/` body + `{"port_overrides":[... port16 forward:disabled ...]}` with `X-CSRF-Token`, then restore. +- Drop/kick history check: `POST .../stat/event {"within":168,"_limit":5000}` and `.../stat/alarm` + both returned **0** records for Cascades -> controller not retaining client history. +- Result: 23 devices on VOICE (`10.0.30.201` desktop + `.202`-`.223` the 22 Poly). + +## Pending / Incomplete Tasks + +- **8 wired AudioCodes** (USW-16-PoE ports 1-8) — flip port -> VOICE + **PoE power-cycle** each to + re-DHCP. Not yet done. +- **Christine's last name** (room 515, `10.0.30.220`, mac `48:25:67:64:95:6b`) — flagged VERIFY in + the inventory (Howard unsure; "~Nyuda"). +- **Network logging build** — execute `network-logging-plan.md` (step 1: confirm Synology + model/RAM/DSM -> Log Center-only vs Container Manager Graylog/Loki). +- Confirm phones register to cloud PBX (assumed; dial-tone proven on one) — add Part A 5b pinhole + only if a phone fails to register. + +## Reference Information + +- Runbook: `clients/cascades-tucson/docs/network/voice-vlan-cutover.md` +- Inventory: `clients/cascades-tucson/docs/network/voice-phone-inventory.md` +- Logging plan: `clients/cascades-tucson/docs/network/network-logging-plan.md` +- Prior log: `clients/cascades-tucson/session-logs/2026-06/2026-06-17-howard-voice-vlan30-build.md` +- Memory: `.claude/memory/project_cascades_isolated_vlan_pattern.md` +- Vault PPSK: `clients/cascades-tucson/wifi-voice-ppsk.sops.yaml`