sync: auto-sync from HOWARD-HOME at 2026-06-18 12:21:23
Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-18 12:21:23
This commit is contained in:
76
clients/cascades-tucson/docs/network/network-logging-plan.md
Normal file
76
clients/cascades-tucson/docs/network/network-logging-plan.md
Normal file
@@ -0,0 +1,76 @@
|
||||
# Cascades — Network Logging / Observability Plan (SPEC — build later)
|
||||
|
||||
- **Created:** 2026-06-17 (Howard-Home / claude-main)
|
||||
- **Status:** PLAN ONLY — no infra changes made. For a scheduled build.
|
||||
- **Goal:** Capture + retain a searchable record of **device drops / kicks / disconnects** and the
|
||||
telemetry to **root-cause the ongoing Cascades network issues** (2.4 GHz congestion, sticky
|
||||
clients, roaming/min-RSSI deauths — see `reports/2026-06-16-unifi-full-audit.md`).
|
||||
|
||||
## The problem we found (2026-06-17)
|
||||
- **The UniFi controller is NOT retaining client history.** A 7-day pull of the Cascades site's
|
||||
`stat/event` AND `stat/alarm` returned **zero** records (auth/site fine — client/device queries
|
||||
return data). So when a phone/device drops or is kicked, **nothing is recorded** -> the network
|
||||
is a black box after the fact.
|
||||
- **pfSense logs locally but in tiny circular buffers** (clog) that roll over in hours — no useful
|
||||
history, no search.
|
||||
- => We must **capture events at the source and ship them to a store with retention + search**.
|
||||
pfSense and UniFi are log *sources*; neither is a retention/search platform on its own.
|
||||
|
||||
## Where the collector lives — decision
|
||||
| Candidate | Verdict |
|
||||
|---|---|
|
||||
| **CS-SERVER** | **NO.** Fragile EOL DC (Dell R610, ~16 yr, **degraded OS RAID-1**, single-DC data-loss risk, I/O-bound). Adding syslog ingestion load is unacceptable. |
|
||||
| **pfSense / UniFi alone** | Sources only. pfSense local retention ~hours; UniFi retains ~0 client events. Live view yes, forensics no. |
|
||||
| **Synology cascadesDS (`192.168.0.120`)** | **PREFERRED on-site collector.** DSM up on :5001 (vault `clients/cascades-tucson/synology-cascadesds`). Built-in **Log Center** = a syslog server (retention + search + notifications), no Docker needed. Becoming backup-only anyway -> light syslog duty fits, keeps logs local + off CS-SERVER. |
|
||||
| Jupiter (`172.16.3.20`, ACG office Docker) | Fallback if a richer stack (Graylog/Loki) is wanted; cross-site (Cascades -> office). Use only if on-site Synology is ruled out. |
|
||||
|
||||
**Recommendation:** Synology **Log Center** as the on-site syslog collector. If cascadesDS turns
|
||||
out to be a Plus/x86 model with spare RAM, **Container Manager** can later add **Graylog** or
|
||||
**Grafana Loki** for richer search/dashboards/alerting — but Log Center alone meets the core ask.
|
||||
|
||||
## Sources to configure (ship syslog -> Synology Log Center, UDP/TCP 514)
|
||||
1. **pfSense** (`192.168.0.1`): Status -> System Logs -> Settings -> **Remote Logging**: server =
|
||||
Synology IP:514; select **System, Firewall, DHCP, Gateway**. (DHCP lease grant/expire/decline =
|
||||
device-drop + IP-churn signal; firewall = blocked traffic.)
|
||||
2. **UniFi controller + Cascades APs** (UOS `172.16.3.29`, site `va6iba3v`): Settings -> System ->
|
||||
enable **Remote Logging / syslog** to the Synology, include **client events / debug** so the
|
||||
**APs emit assoc/DEAUTH-with-reason-code + RSSI-at-disconnect + roam** events — the gold data for
|
||||
"who got kicked and why." Confirm AP syslog is forwarded (not just controller app log).
|
||||
3. **(Optional) switches** — port up/down/flap events (the ~25 underspeed ports + 3 offline
|
||||
switches in the audit are suspects).
|
||||
|
||||
## Client time-series snapshotter (fills the controller's history gap)
|
||||
Because the controller isn't keeping client history, add a small **poller** (every 1-2 min) that
|
||||
hits the controller API `/stat/sta` for the Cascades site and appends per-client rows:
|
||||
`ts, mac, hostname, ap, band, channel, rssi, tx_retry%, satisfaction, is_wired`.
|
||||
- **Where to run:** Synology Task Scheduler + a script, or a small container; or cross-site on
|
||||
GuruRMM (`172.16.3.30`) via cron; or a coord-scheduled job. Store as SQLite/CSV (or into the
|
||||
collector if Graylog/Loki is chosen).
|
||||
- **Why:** lets us answer "did device X drop because RSSI cratered / it stuck to a far AP / 2.4 GHz
|
||||
airtime saturated" — correlating drops with the documented RF problems. Pairs with the existing
|
||||
`unifi-wifi` skill collectors (`watch-ap.sh`, `radio-usage.sh`, `neighbor-collect.sh`).
|
||||
|
||||
## Alerting (phase 2)
|
||||
From Log Center (or Graylog/Loki): notify (Discord via `post-bot-alert.sh` / `discord-dm`) on AP
|
||||
reboot, switch-port flap, repeated deauths for a tracked device, or DHCP pool pressure.
|
||||
|
||||
## Retention
|
||||
Target 30-90 days searchable (HIPAA-adjacent network metadata; no PHI in syslog). Size the Synology
|
||||
Log Center archive / volume accordingly; rotate/compress older.
|
||||
|
||||
## Build steps (when scheduled)
|
||||
1. Confirm cascadesDS **model + RAM + DSM version** (determines Log Center-only vs Container Manager
|
||||
for Graylog/Loki). Cred: vault `clients/cascades-tucson/synology-cascadesds`.
|
||||
2. Install/enable **Log Center** (Package Center) -> enable **syslog server** (514), set retention.
|
||||
3. Point **pfSense** remote syslog at it (sources above) — verify receipt.
|
||||
4. Enable **UniFi controller + AP** remote syslog (with client/deauth events) — verify AP deauth
|
||||
events arrive with reason + RSSI.
|
||||
5. Deploy the **client snapshotter** (cron/Task Scheduler) — verify rows accumulating.
|
||||
6. (Optional) Container Manager -> Graylog/Loki+Grafana for dashboards; wire alerting.
|
||||
7. Validate: force a test device off WiFi -> confirm a searchable deauth event with reason + RSSI.
|
||||
|
||||
## Open items
|
||||
- Confirm cascadesDS model/RAM/Docker capability (step 1).
|
||||
- Confirm no PHI traverses syslog (network metadata only) for the HIPAA file.
|
||||
- Decide retention window + alert thresholds.
|
||||
- If on-site is rejected -> fall back to Jupiter (Graylog/Loki) cross-site.
|
||||
@@ -0,0 +1,42 @@
|
||||
# Cascades — Voice (VLAN 30) Device Inventory
|
||||
|
||||
Living tracker of devices migrated onto the isolated **VOICE VLAN 30** (`10.0.30.0/24`).
|
||||
Built/cutover started 2026-06-17. Runbook: `voice-vlan-cutover.md`. PPSK key (Poly WiFi):
|
||||
vault `clients/cascades-tucson/wifi-voice-ppsk`.
|
||||
|
||||
- DHCP pool `10.0.30.100-.250` (dynamic; no reservations). Gateway `10.0.30.1`, DNS `8.8.8.8/1.1.1.1`.
|
||||
- Isolation: internet/cloud-PBX only; blocked from PHI/LAN/VLAN20/mgmt (verified `pfctl -sr`).
|
||||
|
||||
## On VOICE so far
|
||||
|
||||
| Lease IP | MAC | Type | Location / Owner | Status |
|
||||
|---|---|---|---|---|
|
||||
| 10.0.30.201 | e4:e7:49:52:3a:06 | Vertical-Remote desktop (wired, USW-16-PoE p16) | Vertical mgmt jump box (LogMeIn/RDP) | On VOICE; re-leased after power-on |
|
||||
| 10.0.30.202 | 48:25:67:64:8a:88 | Poly (WiFi, CSCNet voice PPSK) | **Accounting Director office — Lauren Hasselman** | On VOICE; **dial tone + outbound call to cell verified** |
|
||||
| 10.0.30.203 | 48:25:67:d0:b8:ac | Poly (WiFi, CSCNet voice PPSK) | **Life Enrichment office 132** | On VOICE |
|
||||
| 10.0.30.204 | 48:25:67:a3:f8:3b | Poly (WiFi, CSCNet voice PPSK) | **Front desk phone** | On VOICE |
|
||||
| 10.0.30.205 | 48:25:67:64:93:34 | Poly (WiFi, CSCNet voice PPSK) | **Front desk courtesy phone** | On VOICE |
|
||||
| 10.0.30.206 | 48:25:67:64:91:ea | Poly (WiFi, CSCNet voice PPSK) | **Kitchen Director — Alyssa** (via Dining Room AP) | On VOICE |
|
||||
| 10.0.30.207 | 48:25:67:64:8f:0b | Poly (WiFi, CSCNet voice PPSK) | **Kitchen phone** (via Kitchen AP) | On VOICE |
|
||||
| 10.0.30.208 | 48:25:67:64:94:ba | Poly (WiFi, CSCNet voice PPSK) | **Chef phone** (via Kitchen AP) | On VOICE |
|
||||
| 10.0.30.209 | 48:25:67:64:89:6e | Poly (WiFi, CSCNet voice PPSK) | **Tamra** (via AP 217) | On VOICE |
|
||||
| 10.0.30.210 | 48:25:67:d0:b1:83 | Poly (WiFi, CSCNet voice PPSK) | **Crystal** (via AP 217) | On VOICE |
|
||||
| 10.0.30.211 | 48:25:67:64:8e:ae | Poly (WiFi, CSCNet voice PPSK) | **Megan** (via AP 217) | On VOICE |
|
||||
| 10.0.30.212 | 48:25:67:64:93:25 | Poly (WiFi, CSCNet voice PPSK) | **Lois Lane** — room 206 | On VOICE |
|
||||
| 10.0.30.213 | 48:25:67:64:93:4f | Poly (WiFi, CSCNet voice PPSK) | **Medtech phone** — room 206 | On VOICE |
|
||||
| 10.0.30.214 | 48:25:67:64:8f:1d | Poly (WiFi, CSCNet voice PPSK) | **Christina Durpas** — room 206 | On VOICE |
|
||||
| 10.0.30.215 | 48:25:67:64:86:aa | Poly (WiFi, CSCNet voice PPSK) | **Veronica Feller** — room 206 | On VOICE |
|
||||
| 10.0.30.216 | 48:25:67:64:92:6b | Poly (WiFi, CSCNet voice PPSK) | **Lupe Sanchez** (via AP 324) | On VOICE |
|
||||
| 10.0.30.217 | 48:25:67:d0:ae:3e | Poly (WiFi, CSCNet voice PPSK) | **Memory Care reception** (via Memcare Nurse Station AP) | On VOICE |
|
||||
| 10.0.30.218 | 48:25:67:d0:af:10 | Poly (WiFi, CSCNet voice PPSK) | **Shelby Trozzi** — MemCare Director | On VOICE |
|
||||
| 10.0.30.219 | 48:25:67:d0:b4:26 | Poly (WiFi, CSCNet voice PPSK) | **Karen Rossini** — room 515 | On VOICE |
|
||||
| 10.0.30.220 | 48:25:67:64:95:6b | Poly (WiFi, CSCNet voice PPSK) | **Christine** (last name ~Nyuda — VERIFY) — room 515 | On VOICE |
|
||||
| 10.0.30.221 | 48:25:67:64:91:cf | Poly (WiFi, CSCNet voice PPSK) | **Salon phone** (via salon AP) | On VOICE |
|
||||
| 10.0.30.222 | 48:25:67:64:81:8e | Poly (WiFi, CSCNet voice PPSK) | **Meredith** — room 140 | On VOICE |
|
||||
| 10.0.30.223 | 48:25:67:64:8f:14 | Poly (WiFi, CSCNet voice PPSK) | **Ashley** — room 103 | On VOICE |
|
||||
|
||||
## Still to migrate
|
||||
- **Poly (WiFi): 22 of 22 DONE** ✓ — all wireless phones migrated to VOICE.
|
||||
- **AudioCodes (8, wired USW-16-PoE ports 1-8): 0 of 8 done** — flip port -> VOICE **+ PoE Power-Cycle** each to re-DHCP. MACs (OUI `00:90:8f`): see runbook appendix.
|
||||
|
||||
> Note: lease IPs are dynamic — a phone may pull a different `.1xx`/`.2xx` on renewal. Track by **MAC + location**, not IP.
|
||||
@@ -17,6 +17,8 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure ·
|
||||
|
||||
<!-- Append entries below this line -->
|
||||
|
||||
2026-06-18 | Howard-Home | rmm | [friction] agent returns exit -1 'Failed to execute command' on a ~7KB multi-line powershell body sent as one command; split into <2KB section scripts and each ran fine [ctx: host=DESKTOP-TRCIEJA agent=0.6.66]
|
||||
|
||||
2026-06-18 | Howard-Home | pfsense-ssh/logs | [friction] used clog on pfSense 25.07 logs (now plain-text ASCII) -> empty output -> wrongly concluded DHCP log was empty / dhcpd not serving; cost a hypothesis. Read pfSense 25.07 logs with tail/grep/cat directly, NOT clog [ctx: ref=reference_pfsense_25_07_ops client=cascades-tucson]
|
||||
|
||||
2026-06-17 | GURU-5070 | mailbox/365-mail | [correction] claimed in a prior session that /mailbox skill + memories were repointed off the deleted fabb3421 to the 365-mail suite, but mailbox.md still hardwired fabb3421 (token 401 AADSTS700016). Correct app is the dedicated ComputerGuru Mailbox app 1873b1b0 via get-token.sh 'mailbox' tier (cert auth); repointed mailbox.md + feedback_365_remediation_tool.md 2026-06-17. Lesson: verify the edit actually landed before reporting it done.
|
||||
|
||||
@@ -203,7 +203,7 @@ Because per-user **Intune** never provisioned tenant-wide (`INTUNE_A = PendingIn
|
||||
- **Known hardware:** AP 108 (Floor 1) offline pending a new cable run (expected). Stale duplicate controller object ("108" vs "108U7 Pro") to clean up separately.
|
||||
- **Creds (vault refs only):** `infrastructure/uos-server-ssh-key` (SSH/Mongo), `infrastructure/uos-server-network-api-rw` (RW controller admin), `clients/cascades-tucson/unifi-ap-ssh` (per-AP device auth via site VPN), `clients/cascades-tucson/pfsense-firewall` (pfSense admin for pfsense-ssh.sh).
|
||||
- **VoIP (vendor: Vertical -- Richard Turner <RTurner@vertical.com>):** Two phone fleets -- **8 AudioCodes** (OUI `00:90:8f`, WIRED on USW-16-PoE ports 1-8, Default/main LAN) and **22 Poly** (OUI `48:25:67`, WiFi via CSCNet PPSK -> VLAN 20 Internal). The **Vertical-Remote management desktop** (`192.168.2.180`, MAC `e4:e7:49:52:3a:06`, WIRED USW-16-PoE port 16, Default LAN, **static IP, no ACG login**) is RDP-only (recon 2026-06-16 -- not a PBX). No on-prem SIP PBX found -> phones appear to register to a **cloud/hosted PBX** (Vertical). Infra must stay static.
|
||||
- **[PLANNED] Voice VLAN (VLAN 30) consolidation for the phones:** Segmentation left voice gear split (Poly on VLAN 20; AudioCodes + Vertical desktop on the main LAN), and main-LAN -> VLAN 20 is blocked at pfSense -- so the desktop can't reach the wireless phones and phone IPs drift. Fix: a dedicated isolated **VLAN 30 VOICE (`10.0.30.0/24`, gw `10.0.30.1`, pfSense igc1.30)** holding ALL phones + the Vertical desktop; internet egress allowed, firewalled off VLAN 20 / main LAN / PHI (HIPAA); Vertical's pfSense OpenVPN scoped to `10.0.30.0/24` via a Client-Specific-Override. Desktop is static + no ACG login -> Vertical sets it to DHCP (or grants temp access) at cutover; reserve `10.0.30.10`. Status: PLANNED -- vendor email sent 2026-06-16, awaiting Richard's confirm (cloud-PBX, desktop static, VPN cert CN) + a window. **Full runbook + recon: `clients/cascades-tucson/docs/network/voice-vlan-cutover.md`.**
|
||||
- **[IN PROGRESS 2026-06-17] Voice VLAN (VLAN 30) consolidation for the phones:** dedicated isolated **VLAN 30 VOICE (`10.0.30.0/24`, gw `10.0.30.1`, pfSense igc1.30, DHCP `.100-.250`, DNS `8.8.8.8/1.1.1.1`)** holding ALL phones + the Vertical desktop; internet/cloud-PBX egress only, firewalled off VLAN 20 / main LAN / PHI / mgmt (HIPAA). **BUILT + VERIFIED:** isolation rules are a clone of the GUEST VLAN (the only actually-isolated net -- all Protocol=Any quick: block 192.168.0.0/22 + 10.0.0.0/8 + 172.16.0.0/12, then pass any; confirmed via `pfctl -sr`). UniFi VOICE network + CSCNet voice PPSK created (key vaulted `clients/cascades-tucson/wifi-voice-ppsk`). **Richard's confirmations (2026-06-17) simplified it:** desktop is **DHCP** (not static -> zero-touch, no reservation), and Vertical uses **LogMeIn not the pfSense OpenVPN** (so no OpenVPN CSO/cert -- desktop just needs internet egress). **Migration underway** (track in `docs/network/voice-phone-inventory.md`): Vertical desktop (10.0.30.201) + 2 Poly live -- Accounting Director / Lauren Hasselman (`48:25:67:64:8a:88`) **dial-tone + outbound call to cell VERIFIED**, and Life Enrichment office 132 (`48:25:67:d0:b8:ac`). **Remaining:** 8 AudioCodes (wired ports 1-8, flip + PoE power-cycle to re-DHCP) + ~20 Poly (re-key to voice PPSK), being moved by Howard. **KEY GOTCHA:** re-VLANing a wired port does NOT move the IP -- the device holds its old lease until the link is bounced (PoE power-cycle / disable-enable); a UniFi client block/unblock won't do it. **Full runbook: `clients/cascades-tucson/docs/network/voice-vlan-cutover.md`; live inventory: `docs/network/voice-phone-inventory.md`.**
|
||||
|
||||
### External Vendors & Mail Senders
|
||||
|
||||
@@ -413,7 +413,7 @@ Primary active project as of 2026-05-24: dept-by-dept domain migration (Syncro #
|
||||
- ALIS app session timeout: lower from 20 to 15 min (Howard, ALIS admin) -- PENDING
|
||||
- **[CRITICAL] CS-SERVER degraded RAID-1 (2026-06-15):** OS mirror (C:) running on a single 320 GB laptop spindle, no redundancy. Plan SSD rebuild-then-swap (image C: first, AFTER backup verifies). DC migration is the real fix. Cloud backup installed/started 2026-06-15 -- **verify first full completes + confirm image-based + set retention before any drive work.**
|
||||
- **[CLEANUP] CS-SERVER agent sprawl:** remove the previous MSP's leftover Datto RMM (CentraStage) + Datto EDR (Infocyte) stack (thrashing the degraded disk).
|
||||
- **[PLANNED] Voice VLAN (VLAN 30) for Vertical phones + remote desktop:** vendor email sent 2026-06-16, awaiting Richard Turner's confirm (cloud-PBX confirmed via recon, desktop static, VPN cert CN) + maintenance window, then execute. Runbook: `clients/cascades-tucson/docs/network/voice-vlan-cutover.md`.
|
||||
- **[IN PROGRESS 2026-06-17] Voice VLAN (VLAN 30) for Vertical phones + remote desktop:** Richard confirmed; VLAN built + verified (pfSense + UniFi), desktop is DHCP (not static), access is LogMeIn (not OpenVPN). Migration underway -- desktop + 2 Poly live (dial-tone verified), AudioCodes + remaining Poly being moved by Howard. Runbook: `docs/network/voice-vlan-cutover.md`; live inventory: `docs/network/voice-phone-inventory.md`.
|
||||
- **[IN PROGRESS] Wireless RF remediation (2.4 GHz):**
|
||||
- Phase A (power-down to Low): Floor-4 pilot APPLIED 2026-06-16 (retry 13.2->9.5%, no coverage loss). Remaining floors (1-3, 5-6 + floor-2/misc per-AP) = staged, awaiting go-ahead. Runbook: `clients/cascades-tucson/reports/2026-06-16-2.4ghz-remediation-runbook.md`.
|
||||
- Phase C (disable 9 redundant 2.4 radios): staged, awaiting Phase A validation + explicit go-ahead. APs 445/428 disables held; AP 128 disabled.
|
||||
|
||||
Reference in New Issue
Block a user