sync: auto-sync from HOWARD-HOME at 2026-06-18 12:31:06

Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-06-18 12:31:06
This commit is contained in:
2026-06-18 12:31:14 -07:00
parent 90f69715f0
commit fa297f6930

View File

@@ -0,0 +1,139 @@
# Cascades — Voice VLAN 30 live migration (all Poly + desktop) + network-logging plan
## User
- **User:** Howard Enos (howard)
- **Machine:** Howard-Home
- **Role:** tech
## Session Summary
Continuation of the 2026-06-17 VOICE VLAN 30 build (see `2026-06-17-howard-voice-vlan30-build.md`).
This session executed the live device migration onto the isolated VLAN 30 and produced a spec for
network observability. Work spanned the 06-17 -> 06-18 date boundary.
First, the Vertical-Remote management desktop was moved. Howard set USW-16-PoE port 16 native VLAN
to VOICE, but the desktop kept its old `192.168.2.180` lease. Diagnosis (pfSense + UniFi
controller) showed nothing misconfigured: re-VLANing a wired port does not bounce the NIC link, so
Windows held its old lease and its unicast renewal to the old DHCP server was (correctly) blocked by
the VOICE isolation rules. A UniFi client block/unblock is a MAC filter, not a link bounce, so it
had no effect. Fixed by bouncing port 16 via the controller API (PUT rest/device port_overrides
forward:disabled then restore, preserving ports 1-8) — the desktop re-DHCP'd to 10.0.30.201.
Second, Howard re-keyed the 22 Poly WiFi phones to the voice PPSK over ~2 hours. As each phone
joined, the controller `/stat/sta` was polled to map the new 10.0.30.x lease to the phone's
location/owner. A WiFi re-auth is itself a fresh DHCP, so the Poly phones needed no bounce. The
first phone (Lauren Hasselman, Accounting Director) was validated end-to-end: dial tone + an
outbound call to a cell phone. All 22 Poly phones plus the desktop (23 devices) ended up on VOICE,
each pulling a clean lease and isolated from PHI/LAN/VLAN20/mgmt. A living inventory doc was created
(`docs/network/voice-phone-inventory.md`) and the wiki Voice-VLAN entry flipped PLANNED -> IN
PROGRESS.
Third, Howard raised the need for network logging to track devices that drop/get kicked and to
root-cause the ongoing Cascades network issues. Investigation found the UniFi controller is
retaining ZERO client events/alarms for the Cascades site over 7 days, and pfSense logs locally in
tiny circular buffers — i.e., drop/kick history is not being captured at all. A "plan only" spec was
written (`docs/network/network-logging-plan.md`) recommending the Synology cascadesDS (DSM Log
Center syslog server) as the on-site collector (CS-SERVER ruled out as the fragile EOL DC), with
pfSense + UniFi/AP syslog as sources and a 1-2 min client snapshotter to fill the controller's
history gap.
Finally, a sync hit a rebase conflict because controller-query scratch files written to the repo
CWD (.sta.json etc.) were swept into a commit by `git add -A`, and a stray locked curl.exe held the
file. Killed the process, untracked .sta.json, gitignored the temp patterns, and pushed clean.
## Key Decisions
- **Desktop cutover via port bounce, not NIC change.** Confirmed desktop is DHCP; the fix for a
stuck lease after re-VLAN is a link bounce (port disable/enable or PoE power-cycle), not a NIC
reconfig and not a UniFi client block/unblock.
- **Read drop/kick state from the UniFi controller, not pfSense SSH**, after pfSense sshd began
rate-limiting following many rapid SSH calls. Controller API (`/stat/sta`) was the healthy path
and also gives AP/location hints.
- **Track phones by MAC + location, not IP** (leases are dynamic; a phone may renew to a different
10.0.30.x).
- **Network-logging collector = Synology Log Center, NOT CS-SERVER.** CS-SERVER is the fragile
EOL/degraded-RAID single DC; adding syslog ingestion is unacceptable. pfSense/UniFi are sources,
not retention/search stores. Synology keeps it on-site and off the DC.
- **Plan only for logging** (per Howard) — spec written, build scheduled later.
## Problems Encountered
- **Desktop stuck on 192.168.2.180 after port moved to VLAN 30** — stale DHCP lease; renewal
blocked by VOICE isolation. Resolved by bouncing port 16 via controller API -> re-DHCP to
10.0.30.201.
- **UniFi controller PUT returned HTTP 403** — UniFi OS requires a CSRF token on writes. Resolved by
reading `x-updated-csrf-token` from the login response headers and sending `X-CSRF-Token`.
- **pfSense SSH began failing (exit 255)** while ping still succeeded — sshd rate-limiting after many
rapid `pfsense-ssh.sh` calls. Switched to the UniFi controller API for subsequent reads.
- **Git-Bash `/tmp` path mismatch** — msys `curl -o /tmp/x.json` wrote where Windows python could not
read (FileNotFoundError). Switched to CWD-relative scratch files.
- **Scratch files committed + rebase blocked** — CWD-relative `.sta.json` got swept into a commit by
sync's `git add -A`, and a stray locked `curl.exe` (PID 25252) held the file, blocking the rebase.
Killed the process, `git rm --cached .sta.json`, gitignored `.sta.json`/`.dev.json`/`.q*`/etc.,
committed, and pushed. (Lesson: write API scratch OUTSIDE the repo or use the ignored `.tmp-` prefix.)
- **Earlier errorlog rebase conflict** (concurrent GURU-5070 entry) — resolved keeping both entries.
## Configuration Changes
- **Created** `clients/cascades-tucson/docs/network/voice-phone-inventory.md` — living inventory, 23
devices on VOICE (desktop + 22 Poly) with MAC/IP/location.
- **Created** `clients/cascades-tucson/docs/network/network-logging-plan.md` — observability spec
(build later).
- **Updated** `clients/cascades-tucson/docs/network/voice-vlan-cutover.md` — added the
bounce-to-re-DHCP CRITICAL step; fixed stale NIC-change/OpenVPN/reservation references.
- **Updated** `wiki/clients/cascades-tucson.md` — Voice VLAN PLANNED -> IN PROGRESS, two locations.
- **Updated** `.claude/memory/MEMORY.md` + created `project_cascades_isolated_vlan_pattern.md` (prior
session, synced).
- **Updated** `.gitignore` — ignore controller-query scratch patterns; `git rm --cached .sta.json`.
- **pfSense (prior session, this thread):** VOICE rule protocol TCP -> Any via PHP config API.
- **UniFi:** bounced USW-16-PoE port 16 (disable/restore) via controller API — temporary, restored
to exact original (native VOICE, forward:customize).
## Credentials & Secrets
- **VOICE PPSK key** `V0!c38863171` — vault `clients/cascades-tucson/wifi-voice-ppsk.sops.yaml`
(created prior session, pushed). Entered on all 22 Poly phones this session.
- UniFi controller RW: vault `infrastructure/uos-server-network-api-rw` (used for reads + the port
bounce). pfSense admin: vault `clients/cascades-tucson/pfsense-firewall`. Synology: vault
`clients/cascades-tucson/synology-cascadesds`.
## Infrastructure & Servers
- **VOICE VLAN 30:** `10.0.30.0/24`, gw `10.0.30.1` (pfSense `igc1.30`/opt241), DHCP `.100-.250`,
DNS `8.8.8.8/1.1.1.1`. Isolation = Guest-clone (any-proto quick blocks to 192.168.0.0/22 +
10.0.0.0/8 + 172.16.0.0/12, then pass any).
- **UniFi VOICE network** id `6a32e0194e709ad31ad161e6` (VLAN Only). USW-16-PoE mac
`d8:b3:70:21:94:5f`, device_id `685f39078e65331c46ef7e90`. UOS `172.16.3.29:11443`, site
`va6iba3v`.
- **Synology cascadesDS** `192.168.0.120` (DSM up on :5001) — proposed logging collector.
- **Jupiter** `172.16.3.20` (Unraid/Docker, hosts UniFi VM) — fallback collector host.
## Commands & Outputs
- Controller client poll (mapping phones): `POST /api/auth/login` -> GET
`/proxy/network/api/s/va6iba3v/stat/sta`, filter `network==VOICE or vlan==30`.
- Port bounce: `PUT /proxy/network/api/s/va6iba3v/rest/device/<id>` body
`{"port_overrides":[... port16 forward:disabled ...]}` with `X-CSRF-Token`, then restore.
- Drop/kick history check: `POST .../stat/event {"within":168,"_limit":5000}` and `.../stat/alarm`
both returned **0** records for Cascades -> controller not retaining client history.
- Result: 23 devices on VOICE (`10.0.30.201` desktop + `.202`-`.223` the 22 Poly).
## Pending / Incomplete Tasks
- **8 wired AudioCodes** (USW-16-PoE ports 1-8) — flip port -> VOICE + **PoE power-cycle** each to
re-DHCP. Not yet done.
- **Christine's last name** (room 515, `10.0.30.220`, mac `48:25:67:64:95:6b`) — flagged VERIFY in
the inventory (Howard unsure; "~Nyuda").
- **Network logging build** — execute `network-logging-plan.md` (step 1: confirm Synology
model/RAM/DSM -> Log Center-only vs Container Manager Graylog/Loki).
- Confirm phones register to cloud PBX (assumed; dial-tone proven on one) — add Part A 5b pinhole
only if a phone fails to register.
## Reference Information
- Runbook: `clients/cascades-tucson/docs/network/voice-vlan-cutover.md`
- Inventory: `clients/cascades-tucson/docs/network/voice-phone-inventory.md`
- Logging plan: `clients/cascades-tucson/docs/network/network-logging-plan.md`
- Prior log: `clients/cascades-tucson/session-logs/2026-06/2026-06-17-howard-voice-vlan30-build.md`
- Memory: `.claude/memory/project_cascades_isolated_vlan_pattern.md`
- Vault PPSK: `clients/cascades-tucson/wifi-voice-ppsk.sops.yaml`