wiki: full recompile cascades-tucson + dataforth (RF/voice applied state; mail stack, FreePBX, shares, cert pipeline; live Syncro hours)
This commit is contained in:
@@ -2,8 +2,8 @@
|
||||
type: client
|
||||
name: cascades-tucson
|
||||
display_name: Cascades of Tucson
|
||||
last_compiled: 2026-06-19
|
||||
compiled_by: HOWARD-HOME/claude-main
|
||||
last_compiled: 2026-06-20
|
||||
compiled_by: GURU-5070/claude-main
|
||||
sources:
|
||||
- session-logs/2026-03-24-session.md
|
||||
- session-logs/2026-03-31-session.md
|
||||
@@ -153,14 +153,15 @@ Because per-user **Intune** never provisioned tenant-wide (`INTUNE_A = PendingIn
|
||||
- Lupe Sanchez -- staff (DESKTOP-TRCIEJA). EOL workstation (Gateway ZX6971 AIO, i3-2120, 8 GB RAM, Win11 unsupported). **Decision 2026-06-18: replace machine** (dual-AV + EOL hardware causing slow Excel; no remediation on current box). GuruRMM agent `c9bf1a2d-bfdc-401e-9cc8-f9e90bb19587` (resolve live by hostname; UUIDs change on re-enroll).
|
||||
- **Syncro contact emails (authoritative):** ashley.jensen@, jd.martin@, crystal.rodriguez@, John.trozzi@, meredith.kuhn@, accounting@/accountingassistant@cascadestucson.com.
|
||||
- **Billing rate:** $175/hr all labor (prepaid block customer)
|
||||
- **Hours remaining:** **55.75 hrs (live Syncro pull 2026-06-18).** Most recent draws: 0.5h remote 2026-06-10 Meredith locked Word doc (ticket #32403, 56.75->56.25); 0.5h remote 2026-06-12 shared mailboxes Grievances+Surveys (ticket #32417, 56.25->55.75). No draws from 2026-06-17/18 sessions (advisory/diagnostic only). Always live-check via `GET /customers/20149445` before billing.
|
||||
- **Hours remaining:** **48.75 hrs as of 2026-06-20 (live Syncro).** Most recent draw: 7h remote+onsite 2026-06-19 voice VLAN + RF optimization (ticket #32444, 55.75->48.75). Prior: 0.5h remote 2026-06-12 shared mailboxes (ticket #32417, 56.25->55.75); 0.5h remote 2026-06-10 Meredith locked Word doc (ticket #32403, 56.75->56.25). Always live-check via `GET /customers/20149445` before billing.
|
||||
- **Syncro customer ID:** 20149445
|
||||
- **Managed devices (Syncro):** 29 (live pull 2026-06-18)
|
||||
- **Active tickets:** Syncro live pull 2026-06-18 shows **0 open tickets.** See Active Work section for open non-Syncro follow-ups. #32370 (eFax/scanner onsite) was confirmed [New]/open on 2026-06-13 -- verify/likely closed.
|
||||
- **Managed devices (Syncro):** 29 (live 2026-06-20)
|
||||
- **Active tickets:** 0 open Syncro tickets as of 2026-06-20. See Active Work for open non-ticketed projects.
|
||||
- #110680053 / #32303 -- Entra / domain migration project. Status: **Invoiced** as of 2026-06-05. Plan: `C:\Users\Howard\.claude\plans\wise-discovering-panda.md`
|
||||
- #109412123 -- Entra setup project (verify status)
|
||||
- #32403 -- Meredith locked Word doc (0.5h remote, billed 2026-06-10, Invoiced)
|
||||
- #32417 -- Shared mailboxes Grievances+Surveys (0.5h remote, billed 2026-06-12, Invoiced)
|
||||
- #32444 -- Voice VLAN 30 + RF optimization (7h: 4 onsite + 3 remote, billed 2026-06-19, Invoiced)
|
||||
|
||||
---
|
||||
|
||||
@@ -170,34 +171,34 @@ Because per-user **Intune** never provisioned tenant-wide (`INTUNE_A = PendingIn
|
||||
|
||||
| Host | IP | Role | OS | Notes |
|
||||
|---|---|---|---|---|
|
||||
| CS-SERVER | 192.168.2.254 | DC, DNS, DHCP (no scopes), File Server, Hyper-V host, Print Server | Windows Server 2019 Standard | Dell PowerEdge R610 (~2009 hardware, 16+ years old). **Single DC -- CRITICAL risk. No backup until 2026-06-15.** GuruRMM agent ID: `c39f1de7-d5b6-45ae-b132-e06977ab1713` (re-enrolled; always resolve the agent live by hostname, never hardcode the UUID). **OS RAID-1 mirror DEGRADED (2026-06-15) -- see hardware warning below.** |
|
||||
| CS-SERVER | 192.168.2.254 | DC, DNS, DHCP (no scopes), File Server, Hyper-V host, Print Server | Windows Server 2019 Standard | Dell PowerEdge R610 (~2009 hardware, 16+ years old). **Single DC -- CRITICAL risk.** GuruRMM agent ID: `c39f1de7-d5b6-45ae-b132-e06977ab1713` (re-enrolled; always resolve live by hostname, never hardcode the UUID). **OS RAID-1 mirror DEGRADED (2026-06-15) -- see hardware warning below.** |
|
||||
| CS-SERVER iDRAC | 192.168.2.65 | Out-of-band management | -- | Dell OOB interface |
|
||||
| CS-QB (Hyper-V VM on CS-SERVER) | 192.168.2.228 | (label "VoIP server" -- STALE) | -- | **2026-06-16 recon: SMB/445 only, no SIP response -- NOT a live SIP PBX.** Phones appear cloud-registered (Vertical). Label predates the wireless-phone transition; revisit/retire. |
|
||||
| cascadesDS (Synology NAS) | 192.168.0.120 | NAS / legacy file storage | DSM 7.2.1-69057 | Port 5000 HTTP. Workgroup name is "CASCADES" -- same as AD short name, causing Kerberos auth failures from domain-joined machines. Slated to become backup-only. **Synology Drive Server 3.5.0-26088** (active, port 6690 SSL). Current Drive sync: CS-SERVER Drive Client (v7.5.0.16085, runs as sysadmin) syncs Sync-user My Drive (`/volume1/homes/Sync/Drive/`) -> `D:\Shares\Main` (one-way download). Real shared folders (Server 1.9 G, Management 5.5 G, Public ~50 G, SalesDept ~23 G, etc.) are NOT in scope -- Team Folder migration pending. |
|
||||
| pfSense Firewall | 192.168.0.1 | Perimeter firewall, inter-VLAN routing, DHCP/DNS | pfSense Plus 25.07-RELEASE | Netgate device. cert CN=pfSense-685f277aa6886. Dual-WAN. All DHCP (CS-SERVER DHCP role has no scopes). 199 DHCP subnets (per-unit /28 VLANs, assisted-living L2 isolation). SSH shell access works (no interactive menu). Admin vault: `clients/cascades-tucson/pfsense-firewall`. OpenVPN user Howard: vault `clients/cascades-tucson/pfsense-openvpn-howard`. **Config vaulted 2026-06-17:** `clients/cascades-tucson/pfsense-config-backup-2026-06-17.sops.yaml`. pfSense is ZFS (power-loss resilient). Logs are PLAIN TEXT (not clog). |
|
||||
|
||||
**[CRITICAL] CS-SERVER hardware -- RAID degraded (2026-06-15):** Dell R610, basic SAS 6/iR controller (3 Gbps, no cache). The **OS RAID-1 mirror (Virtual Disk2 = C:, holds OS / AD / SQL / page file) is DEGRADED** -- Physical Disk 0:0:3 (320 GB WD SATA laptop drive, `WDC WD3200BEVT`) is Critical/Removed, leaving C: on a single surviving 320 GB Hitachi `HTS545032B9A300` 5400 RPM spindle with ZERO redundancy. A 1.2 TB SAS disk (1:0:4) sits "Ready" but is the wrong size/type to rebuild the 320 GB mirror, so no auto-rebuild fired. D: is a separate healthy RAID-1 (2x 1.2 TB SAS). The degraded mirror on a slow laptop spindle is the root cause of "CS-SERVER slow" reports (random-I/O bound). With the single-DC, EOL (16+ yr) posture this is a data-loss emergency -- SSD rebuild-then-swap is a valid band-aid (image C: first; enterprise SATA SSD >= 480 GB, 2.5"; no TRIM through this controller; buy 2 identical: e.g. Solidigm D3-S4520 480 GB or Samsung PM893 480 GB; SATA negotiates to 3 Gbps; no Dell certified-drive lockout) but the DC migration remains the real fix. Gating: **verify cloud backup first full + image-based + retention before any drive work.**
|
||||
**[CRITICAL] CS-SERVER hardware -- RAID degraded (2026-06-15):** Dell R610, basic SAS 6/iR controller (3 Gbps, no cache). The **OS RAID-1 mirror (Virtual Disk2 = C:, holds OS / AD / SQL / page file) is DEGRADED** -- Physical Disk 0:0:3 (320 GB WD SATA laptop drive, `WDC WD3200BEVT`) is Critical/Removed, leaving C: on a single surviving 320 GB Hitachi `HTS545032B9A300` 5400 RPM spindle with ZERO redundancy. A 1.2 TB SAS disk (1:0:4) sits "Ready" but is the wrong size/type to rebuild the 320 GB mirror. D: is a separate healthy RAID-1 (2x 1.2 TB SAS). Degraded mirror on a slow laptop spindle is root cause of "CS-SERVER slow" reports. Recommended replacement: 2x 480 GB enterprise 2.5" SATA SSD (e.g. Solidigm D3-S4520 or Samsung PM893; SATA negotiates to 3 Gbps; no Dell drive lockout). Gating: **verify cloud backup first full + image-based + retention before any drive work.** DC migration is the real fix.
|
||||
|
||||
**[INFO] Backup -- gap closed (2026-06-15):** Mike installed ACG cloud backup (MSP360/CloudBerry -> ACG-backup server) on CS-SERVER and started a backup, addressing the longstanding SS164.308(a)(7) "no backup" HIPAA gap. (Synology Active Backup for Business remains blocked -- ext4, not Btrfs.) Verify the first full completes and set retention.
|
||||
**[INFO] Backup -- gap closed (2026-06-15):** Mike installed ACG cloud backup (MSP360/CloudBerry -> ACG-backup server) on CS-SERVER, addressing the longstanding SS164.308(a)(7) "no backup" HIPAA gap. Verify the first full completes and set retention.
|
||||
|
||||
**[WARNING] CS-SERVER endpoint-agent sprawl:** CS-SERVER is NOT in the ACG Bitdefender/GravityZone tenant (Cascades company id `66b0448e1e0441d02508bad8`; 3 endpoints there, CS-SERVER absent). Defender is replaced by a Syncro-managed "Endpoint Protection Service". The previous MSP's **Datto RMM/CentraStage + Datto EDR/Infocyte** are still installed on top of Syncro + GuruRMM + ScreenConnect + KPAX -- overlapping agents thrashing the degraded spindle. Clean up the Datto stack. (Infection sweep 2026-06-15: clean.) **DESKTOP-TRCIEJA is another confirmed instance** of the leftover-Datto-stack fleet-wide problem (2026-06-18) -- see Lupe Sanchez in Profile.
|
||||
**[WARNING] CS-SERVER endpoint-agent sprawl:** CS-SERVER is NOT in the ACG Bitdefender/GravityZone tenant (Cascades company id `66b0448e1e0441d02508bad8`; 3 endpoints there, CS-SERVER absent). The previous MSP's **Datto RMM/CentraStage + Datto EDR/Infocyte** are still installed alongside Syncro + GuruRMM + ScreenConnect + KPAX -- overlapping agents thrashing the degraded spindle. Clean up the Datto stack.
|
||||
|
||||
**[WARN] Power outage (2026-06-17):** Building power outage took the entire Cascades network down (all 77 APs + 12 switches, 0 clients). Root cause chain: pfSense was plugged into the **surge-only side of the UPS** (no battery) -- it hard-powered-off uncleanly. ZFS survived (pools healthy, config.xml valid). Dirty boot caused a **duplicate dhcpd** (DISCOVER->OFFER but no REQUEST/ACK) and a **2nd-floor switch (USL24PB `Switch 2nd Floor #2`, 192.168.2.193) with one-way L2 forwarding** that blocked DHCP OFFERs from reaching floor-2 APs. Howard killed the duplicate dhcpd + clean restart remotely; Mike: re-seated pfSense onto battery outlets, restored config from on-box auto-backup (12:20 version, VLAN30 intact), reset+re-adopted Switch 2nd Floor #2 (floors 3/4 followed), rebooted Cox modem (missed post-restore step that prolonged WAN issues). Network fully restored. Post-recovery casualties: devices that booted during the DHCP-down window cached a disconnected state and did not retry (kitchen thermal printer, POS ticket printer) -- power-cycle each as reported. Incident report: `clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md`.
|
||||
**[WARN] Power outage (2026-06-17):** Building power outage took the entire Cascades network down. Root cause: pfSense was plugged into the **surge-only side of the UPS** (no battery) -- it hard-powered-off uncleanly. ZFS survived. Dirty boot caused a **duplicate dhcpd** and a **2nd-floor switch (USL24PB, 192.168.2.193) with one-way L2 forwarding** blocking DHCP OFFERs. Howard killed the duplicate dhcpd remotely; Mike re-seated pfSense onto battery outlets, restored config from on-box auto-backup (12:20 version, VLAN30 intact), reset+re-adopted Switch 2nd Floor #2. Network fully restored. Post-recovery casualties: devices that booted during DHCP-down window cached disconnected state (kitchen thermal printer fixed by power-cycle). Incident report: `clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md`.
|
||||
|
||||
### Email & Identity
|
||||
|
||||
- **M365 tenant:** cascadestucson.com | Tenant ID: `207fa277-e9d8-4eb7-ada1-1064d2221498`
|
||||
- **M365 license:** Business Premium (SPB) -- 34 seats enabled, 3 consumed, 31 free. Business Standard (O365_BUSINESS_PREMIUM) -- **SUSPENDED**, 31 users still assigned. Relicensing 31 users Business Standard -> Business Premium is pending and time-sensitive -- those users may have degraded service.
|
||||
- **M365 license:** Business Premium (SPB) -- 34 seats enabled, 3 consumed, 31 free. Business Standard (O365_BUSINESS_PREMIUM) -- **SUSPENDED**, 31 users still assigned. Relicensing 31 users Business Standard -> Business Premium is pending and time-sensitive.
|
||||
- **On-prem AD domain:** cascades.local | UPN suffix: cascadestucson.com (added 2026-04-13 for Entra Connect SSO readiness)
|
||||
- **MX / mail flow:** Exchange Online (M365). SPF: `v=spf1 a mx ip4:72.194.62.5 include:spf.protection.outlook.com include:spf-0.secureserver.net -all`. DKIM: both M365 selectors published. DMARC: `p=quarantine;pct=100` -- upgraded from p=none. Reports to `info@cascadestucson.com` (unmonitored). No third-party email gateway (EOP direct MX).
|
||||
- **MFA:** CA policy "Require MFA for all users" is enabled. Caregiver bypass in progress -- caregivers cannot satisfy MFA (no personal device), so three scoped CA policies use BLOCK instead. See Patterns section. Voice-call MFA is **disabled tenant-wide** (SMS + Authenticator are the allowed methods). Exception: security group "MFA - Voice Call Scoped (sysadmin)" (id `304f941e-3594-4705-b8e6-ee676297df11`, single member `sysadmin@`) has Voice method enabled.
|
||||
- **MFA:** CA policy "Require MFA for all users" is enabled. Caregiver bypass in progress -- caregivers cannot satisfy MFA (no personal device), so three scoped CA policies use BLOCK instead. Voice-call MFA is **disabled tenant-wide** (SMS + Authenticator are the allowed methods). Exception: security group "MFA - Voice Call Scoped (sysadmin)" (id `304f941e-3594-4705-b8e6-ee676297df11`, single member `sysadmin@`) has Voice method enabled.
|
||||
- **Entra Connect:** Installed on CS-SERVER 2026-04-25. Exited staging 2026-05-14 -- actively syncing (last sync confirmed 2026-05-27). OU=Administrative not yet in sync scope; UPN suffix updates for Administrative OU users pending before that OU can be added.
|
||||
- **Break-glass accounts:** Two planned (`breakglass1-csc@cascadestucson.com`, `breakglass2-csc@cascadestucson.com`). Confirmed not yet created as of 2026-05-27. FIDO2 YubiKeys ordered -- arrival unconfirmed.
|
||||
- **Admin accounts:**
|
||||
- `admin@cascadestucson.com` -- Mike's working admin (cloud-only, Connect-excluded by design)
|
||||
- `sysadmin@cascadestucson.com` -- Howard's working admin (cloud-only, Connect-excluded by design). Object id: `471b13dc-3cf8-416b-a132-f5f3bc8d1cc8`. Vaulted at `clients/cascades-tucson/m365-sysadmin.sops.yaml`.
|
||||
- **ALIS (clinical SaaS):** https://cascadestucson.alisonline.com -- Entra SSO live and working. Install key: `d796539d-356b-4190-9c17-35f0f1129376`. Vault: `clients/cascades-tucson/alis-sso-app-registration.sops.yaml`. ALIS application ID `d5108493-cba8-4f08-90b6-1bb0bc09eb2a`, client secret expires 2028-05-06 (rotation reminder -- expiry breaks ALIS SSO tenant-wide). Per-caregiver: ALIS staff-record Email must match Entra UPN exactly. BAA with Medtelligent not yet verified.
|
||||
- **Admin consent (2026-06-03):** Tenant-wide admin consent (`AllPrincipals` `User.Read`) granted on ALIS Entra service principal (`e1cae4ad-5beb-44ca-82d4-434c9bd835ad`). This resolved `AADSTS65001` sign-in failures. CA was NOT the cause.
|
||||
- **Admin consent (2026-06-03):** Tenant-wide admin consent (`AllPrincipals` `User.Read`) granted on ALIS Entra service principal (`e1cae4ad-5beb-44ca-82d4-434c9bd835ad`). This resolved `AADSTS65001` sign-in failures.
|
||||
- **How to enable ALIS SSO for one user:** (1) Tenant-wide admin consent already done globally. (2) In ALIS admin -> Staff -> user's record, set **Email = exact Entra UPN**. (3) User signs in via "Sign in with Microsoft." (4) Turn off ALIS-native 2FA (Entra is the second factor; native 2FA conflicts and locked out Karen Rossini).
|
||||
- **Diagnostic signature:** a user with zero ALIS-app sign-in events in Entra sign-in logs is still on the old direct-login path -- fix is the ALIS Email match, not anything in Entra.
|
||||
- **Caregiver phones:** 22 Samsung Galaxy A15s enrolled in Intune Shared Device Mode (SDM). Enrollment profile: `CSC - Android Shared Phones (Entra SDM)` (`9a0fcc6d`); 25 devices enrolled per 2026-06-03 Intune pull. Dynamic group: `Cascades - Shared Phones` (`ea96f4b7`). Android enrollment token expires 2027-05-08 -- expiry does NOT unenroll existing devices.
|
||||
@@ -208,33 +209,29 @@ Because per-user **Intune** never provisioned tenant-wide (`INTUNE_A = PendingIn
|
||||
|
||||
### Network
|
||||
|
||||
- **ISP / WAN:** Dual-WAN Cox. WAN1 igc0 `184.191.143.62/30` (Cox Fiber, primary, gateway `184.191.143.61`) + WAN2 igc3 `72.211.21.217/27` (Cox Coax, secondary, static); `WAN_Group` gateway group; both active full-duplex, no loss events (verified 2026-06-16). Both WAN IPs added as Cascades Named Location in Entra (ID: `061c6b06-b980-40de-bff9-6a50a4071f6f`). **Measured bandwidth (2026-06-18):** WAN1 fiber **upload ~522 Mbps** (Cloudflare single-stream from pfSense); RRD 3-day peaks ~680 Mbps down / 98 Mbps up (actual usage). WAN2 coax upload **unmeasured** (remote source-route bind to `72.211.21.217` failed -- needs a WAN2-routed host or the Cox bill; assume asymmetric ~20-50 Mbps up). Implication: 30 calls ~= 3 Mbps vs ~522 Mbps fiber headroom -> **the WAN is NOT the everyday voice bottleneck** (RF is); voice QoS is insurance for WAN2 failover + rare WAN1 saturation. See voice QoS design.
|
||||
- **Firewall:** pfSense Plus **25.07-RELEASE** (Netgate) at `192.168.0.1`, cert CN=pfSense-685f277aa6886. Admin vault: `clients/cascades-tucson/pfsense-firewall`. SSH shell access works (no interactive menu). OpenVPN user Howard: vault `clients/cascades-tucson/pfsense-openvpn-howard` (split-tunnel; `route 192.168.0.0/22`; use OpenVPN GUI or OpenVPN Connect with DCO disabled for stability -- DCO/TAP instability seen 2026-06-16). pfSense-ssh.sh (unifi-wifi skill) provides scripted audit/dhcp/run access. **Logs are PLAIN TEXT on 25.07 -- read with tail/grep, NOT clog (clog returns empty).** pfSense has an **OpenVPN `--inactive` idle timeout (~300s)** configured on the server; it disconnects clients after ~5 min of no tunnel data (keepalive pings do NOT reset this counter). This is a config setting, not a fault -- raise/disable to fix the flapping (fix proposed 2026-06-18, not applied). **[OUTAGE 2026-06-17] pfSense was on UPS surge-only side -- moved to battery-backed outlets by Mike (rectified). On-box auto-backup (12:20 version) restored by Mike; config vaulted `clients/cascades-tucson/pfsense-config-backup-2026-06-17.sops.yaml`. Enable Netgate AutoConfigBackup to prevent off-box backup gap.**
|
||||
- **[INFO] pfSense health check (2026-06-16):** gateway ruled out as WiFi factor -- DHCP not exhausted (270/~507 active ~53% on the AP/WiFi pool), unbound DNS up, both WANs full-duplex/stable, firewall states 28-31k/790k, load 0.6. Minor: igc3/WAN2 Intel I225/226 2.5G counter quirk (1707 input-errors+collisions logged, full-duplex active, no loss) -- not a fault, no action needed.
|
||||
- **ISP / WAN:** Dual-WAN Cox. WAN1 igc0 `184.191.143.62/30` (Cox Fiber, primary, gateway `184.191.143.61`) + WAN2 igc3 `72.211.21.217/27` (Cox Coax, secondary, static); `WAN_Group` gateway group; both active full-duplex, no loss events (verified 2026-06-16). Both WAN IPs added as Cascades Named Location in Entra (ID: `061c6b06-b980-40de-bff9-6a50a4071f6f`). **Measured bandwidth (2026-06-18):** WAN1 fiber **upload ~522 Mbps**; RRD 3-day peaks ~680 Mbps down / 98 Mbps up (actual usage). WAN2 coax upload **unmeasured** (remote source-route test failed -- needs a WAN2-routed host or the Cox bill). 30 calls ~= 3 Mbps vs ~522 Mbps fiber headroom -> **the WAN is NOT the everyday voice bottleneck** (RF is); voice QoS is insurance for WAN2 failover + rare WAN1 saturation.
|
||||
- **Firewall:** pfSense Plus **25.07-RELEASE** (Netgate) at `192.168.0.1`, cert CN=pfSense-685f277aa6886. Admin vault: `clients/cascades-tucson/pfsense-firewall`. SSH shell access works (no interactive menu). OpenVPN user Howard: vault `clients/cascades-tucson/pfsense-openvpn-howard` (split-tunnel; `route 192.168.0.0/22`; use OpenVPN GUI or OpenVPN Connect with DCO disabled for stability). pfSense-ssh.sh (unifi-wifi skill) provides scripted audit/dhcp/run access. **Logs are PLAIN TEXT on 25.07 -- read with tail/grep, NOT clog.** pfSense has an **OpenVPN `--inactive` idle timeout (~300s)** on the server; it disconnects clients after ~5 min of no tunnel data (keepalive pings do NOT reset this counter). Fix proposed 2026-06-18; not applied. **[OUTAGE 2026-06-17] pfSense was on UPS surge-only side -- moved to battery-backed outlets by Mike. On-box auto-backup restored; config vaulted. Enable Netgate AutoConfigBackup to prevent future off-box gap.**
|
||||
- **[INFO] pfSense health check (2026-06-16):** gateway ruled out as WiFi factor -- DHCP not exhausted, unbound DNS up, both WANs full-duplex/stable, firewall states 28-31k/790k, load 0.6.
|
||||
- **LAN / VLAN layout:** Primary staff/AP network `192.168.0.0/22` (pfSense .0.1, cascadesDS .0.120, UniFi APs + most WiFi clients on 192.168.2.x/3.x). DHCP pool 192.168.2.2-192.168.3.254 (~507 cap, ~270 active ~53%). Per-unit /28 VLANs: **199 DHCP subnets** total, mostly `10.x.y.0/28` per apartment (assisted-living L2 isolation) + Staff/Internal VLAN 20 (`10.0.20.0/24`, gw `10.0.20.1`) + Guest VLAN 50 (`10.0.50.0/24`, RFC1918 blocked) + **Voice VLAN 30** (`10.0.30.0/24`, gw `10.0.30.1`). DHCP backend: ISC (Kea config present, dormant). Unbound DNS.
|
||||
- **Switching:** Full UniFi. **77 U7-Pro APs** + **12 managed switches** (1st Floor USW-48 PoE core; floors 2-4 USW-Pro-24-PoE; MemCare USW-Pro-24-PoE; USW Lite 8 PoE; USW-16-PoE VoIP switch). **[WARN] ~25 switch ports linked at 100 Mbps but gig-capable** (systematic cabling/NIC issue, 1st/2nd/3rd-floor switches; investigate after WiFi Phase A). 3 offline switches: Switch 2nd Floor #2, Switch 4th Floor #2, USW Pro Max 16. PoE budgets healthy. Port p38 (1st Floor USW) 4.0% tx-drop rate. All managed on the shared UOS controller (172.16.3.29, HTTPS 11443; see [[uos-server]]); Cascades site short name `va6iba3v`, site_id `685f39068e65331c46ef6dd2`. **Mesh topology:** 2nd Floor Atrium is wireless-mesh parent for CC Bridge + salon (5 GHz backhaul ch36); 206 U7 Pro carries AP 108. Switch hardware replacement on floors 2/3/4 complete. **Note: Switch 2nd Floor #2 (USL24PB, 192.168.2.193) was reset+re-adopted after the 2026-06-17 power outage -- it had one-way L2 forwarding blocking DHCP offers.**
|
||||
- **Switching:** Full UniFi. **77 U7-Pro APs** + **12 managed switches** (1st Floor USW-48 PoE core; floors 2-4 USW-Pro-24-PoE; MemCare USW-Pro-24-PoE; USW Lite 8 PoE; USW-16-PoE VoIP switch). **[WARN] ~25 switch ports linked at 100 Mbps but gig-capable** (systematic cabling/NIC issue, 1st/2nd/3rd-floor switches; investigate after WiFi Phase A). 3 offline switches: Switch 2nd Floor #2, Switch 4th Floor #2, USW Pro Max 16. PoE budgets healthy. Port p38 (1st Floor USW) 4.0% tx-drop rate. All managed on the shared UOS controller (172.16.3.29, HTTPS 11443; see [[uos-server]]); Cascades site short name `va6iba3v`, site_id `685f39068e65331c46ef6dd2`. **Mesh topology:** 2nd Floor Atrium is wireless-mesh parent for CC Bridge + salon (5 GHz backhaul ch36); 206 U7 Pro carries AP 108. Note: Switch 2nd Floor #2 (USL24PB, 192.168.2.193) was reset+re-adopted after the 2026-06-17 power outage.
|
||||
- **WiFi SSIDs:**
|
||||
- **CSCNet -- shared PPSK SSID.** `private_preshared_keys_enabled`; ~230-242 per-key->network mappings (most keys -> per-room resident VLANs 101-631; a few -> Default; one phone key -> Internal/VLAN 20; one voice PPSK -> VOICE/VLAN 30). ~1,190 historical clients (residents' IoT/TVs, staff, phones). **Do NOT repoint the SSID to move a subset of clients** -- move at the PPSK level. wlanconf `685f39078e65331c46ef7ee5`; cred vault `clients/cascades-tucson/wifi-cscnet.sops.yaml`.
|
||||
- CSC ENT -- legacy SSID, main LAN (192.168.0.0/22), being deprecated as migration proceeds
|
||||
- Guest -- isolated, VLAN 50
|
||||
- **Wireless RF status (audit 2026-06-15/16 + changes through 2026-06-17 -- ~587 concurrent clients):**
|
||||
- **2.4 GHz is the primary pain band:** avg TX-retry ~10%, cu_total 69-94% live, catastrophic neighbor BSSID density (ch6 ~33k BSSIDs, ch1 ~19k, ch11 ~17k). 27 of the 40 worst clients on 2.4 GHz (retry 11-42%), mostly IoT/legacy. Root cause: high radio density running at excessive TX power.
|
||||
- **2.4 GHz Phase A status -- OVER-THINNED (as of 2026-06-17):** Floor-4 pilot (2026-06-16) applied 14/15 radios to 6 dBm (retry 13.2->9.5%, no coverage loss). Subsequently overnight 2026-06-17, Phase A was extended: **24 of 76 2.4 radios DISABLED + 42 set to Low (~6 dBm)**; Floors 5/6 + mesh untouched. Results DEGRADED: retry 17->23.4%, satisfaction 39->30 -- over-thinned. **Current recommendation: Low->Medium for the 42 at-Low radios.** Phase 0 (ping-check off, 3AM auto-upgrade disable, pfSense logging) + Phase 1 (combined radio_table PUT: ng power medium [42 radios], na ht 40 [76], na min_rssi -82 [69]) planned, dry-run clean, NOT applied -- pending explicit go-ahead.
|
||||
- **5 GHz:** Auto-channel reassignment applied via UniFi 2026-06-17 (Howard) -- made co-channel overlap **WORSE** (25->30 co-channel pairs from 173 strong neighbor pairs). `dfs-check.sh` 2026-06-16: **ZERO real radar events fleet-wide** (DFS empirically low-risk). Plan: **Option B = combined per-AP PUT of 40MHz + non-DFS optimized channel plan + min-RSSI -82 relax.** Width + channel are coupled (width alone fixed only 7/25 pairs; non-DFS needs 40MHz). Dry-run clean; NOT applied. NOTE: an earlier mid-session claim (2026-06-15 audit) that "DFS was the #1 problem" was an artifact of tooling bugs and was withdrawn -- do not repeat it.
|
||||
- **6 GHz:** active on 75 radios; only ~1 client. **Root cause (found 2026-06-18): CSCNet is not broadcasting 6 GHz** (`wlan_bands=[2g,5g]`) -- the band is dark at the SSID level, so nothing can join it. Largest untapped, clean, non-DFS capacity. Opening 6 GHz on CSCNet (+ BSS-transition `bsstm`) is the **relief valve** that must come BEFORE narrowing 5 GHz to 40 MHz (else 5 GHz congestion just relocates). The Poly phones are 5 GHz (not 6E), so 6 GHz helps voice *indirectly* by pulling resident devices off 5 GHz.
|
||||
- **AP 103 saturated (5 GHz):** ch149, ~75% airtime, ~25,900 retries, 12 clients. Lauren's voice phone (`.202`) was locked here 2026-06-18 (off the CC Bridge mesh AP) -- so AP 103 MUST be relieved (off ch149 / 80->40 MHz / load-balance) or she trades a mesh problem for a congestion one.
|
||||
- **AP-level satisfaction 95-100 fleet-wide.** Pain is in the client tail.
|
||||
- **Client distribution by SSID (2026-06-18):** CSCNet 427 + CSC ENT 131 (legacy, not yet retireable) + Guest 13.
|
||||
- **Config flags:** 6 APs with 2.4 min-RSSI OFF (615, 608, 505, 517, 622, salon); 4 APs off the 1/6/11 plan (128 disabled, 108 offline, 108U7 Pro auto, salon auto).
|
||||
- **Known hardware:** AP 108 (Floor 1) offline pending a new cable run (expected). Stale duplicate controller object ("108" vs "108U7 Pro") to clean up separately.
|
||||
- **Creds (vault refs only):** `infrastructure/uos-server-ssh-key` (SSH/Mongo), `infrastructure/uos-server-network-api-rw` (RW controller admin), `clients/cascades-tucson/unifi-ap-ssh` (per-AP device auth via site VPN), `clients/cascades-tucson/pfsense-firewall` (pfSense admin for pfsense-ssh.sh).
|
||||
- **Wireless RF status (applied 2026-06-19 -- ~587 concurrent clients):**
|
||||
- **2.4 GHz is the primary pain band:** avg TX-retry ~10%, cu_total 69-94% live, catastrophic external neighbor BSSID density (ch6 ~33k BSSIDs, ch1 ~19k, ch11 ~17k). 27 of the 40 worst clients on 2.4 GHz (retry 11-42%), mostly IoT/legacy. Root cause: extreme radio density; external saturation limits benefit of any 1/6/11 channel re-plan.
|
||||
- **2.4 GHz power -> MEDIUM (APPLIED 2026-06-19, validated, kept):** 42 over-thinned-`low` radios floors 1-4 + 5 MemCare floors-5/6 `auto`/full radios (505/517/608/615/622) -> Medium. 24 thinned-disabled radios stayed disabled; 5 mesh-auto APs untouched. Non-regressive. Corrected the 2026-06-17 over-thinning regression (retry had risen to 23.4% at Low). **Remaining 2.4 levers (deferred):** min-RSSI on 6 APs currently OFF (615, 608, 505, 517, 622, salon); 1/6/11 channel re-plan (marginal against external density).
|
||||
- **5 GHz: clean DFS 40 MHz plan APPLIED (2026-06-19) -- retry HALVED.** 72 non-mesh APs on channels {52,60,100,108,116,124,132,140}, 0 co-channel pairs. **Result: retry 8.7->3.8 avg (median 8.2->2.1, ~half).** Validated: all 72 APs holding DFS, 0 radar vacates, satisfaction median 99. Mesh APs (2nd Floor Atrium, CC Bridge, salon, 108) left on auto. **DFS channels at this site are 4-5x cleaner than non-DFS** (2-3% busy vs ch149=12%, ch157=28%, ch44=22%); the earlier "non-DFS only" recommendation was reversed by measured survey data (74/74 APs). `dfs-check.sh` 2026-06-16 + nightly: **ZERO real radar events fleet-wide.** Safety net: UniFi auto-vacates one AP on a radar hit; follow-up = recurring `dfs-check.sh` monitor.
|
||||
- **6 GHz BLOCKED on CSCNet:** 75 radios active but only ~1 client. Root cause: CSCNet `wlan_bands=[2g,5g]` (not broadcasting 6g); enabling 6 GHz on WPA2/PPSK SSID requires WPA3+PMF conversion (`Wpa3MandatoryFor6GHzBand`), touching all 427 clients. **Largest untapped clean capacity.** Deferred to Howard's supervised decision on the SSID security conversion.
|
||||
- **CSCNet BSS-transition (802.11v): ON** (applied 2026-06-19).
|
||||
- **3 AM AP auto-upgrade: OFF** (left off after overnight run; re-enable when ready).
|
||||
- **AP 103 saturated (5 GHz):** ch149 at 75% airtime, ~25,900 retries, 12 clients -- was Lauren's locked AP. With the DFS 40MHz plan applied, ch149 is now assigned elsewhere on the fleet; AP 103's channel should have changed. Verify load post-settle.
|
||||
- **AP-level satisfaction 95-100 fleet-wide.** Pain is in the client tail (IoT stuck on 2.4).
|
||||
- **AP 108 (Floor 1) offline** pending a new cable run. Stale duplicate controller object ("108" vs "108U7 Pro") to clean up separately.
|
||||
- **VoIP (vendor: Vertical -- Richard Turner <RTurner@vertical.com>):** Two phone fleets -- **8 AudioCodes** (OUI `00:90:8f`, WIRED on USW-16-PoE ports 1-8, externally powered / PoE OFF) and **Poly** (OUI `48:25:67`, WiFi via CSCNet PPSK) -- **28 active** (29 re-keyed 2026-06-19, 1 removed bad). **All on VOICE VLAN 30: 28 Poly + 8 AudioCodes (`.224-.231`) + Vertical desktop (`.201`) = 37 devices.** Phones mark **DSCP EF (46)**. **[2026-06-19 hardware change] John (Trozzi) reported the Kitchen server phone (`48:25:67:64:95:7a`) BAD and pulled it; the Bistro phone (`.236`, `48:25:67:64:94:84`) was relocated to the Kitchen to cover it -- so the BISTRO now has NO phone (replacement pending, set up + re-key when it arrives).** (Verify VLAN via the client `vlan` field, NOT the cached display IP.) The **Vertical-Remote management desktop** (`10.0.30.201`, MAC `e4:e7:49:52:3a:06`, WIRED USW-16-PoE port 16, VOICE VLAN 30, **DHCP** -- confirmed not static, LogMeIn remote access, no pfSense OpenVPN) is live on VLAN 30. No on-prem SIP PBX found -> phones appear to register to a **cloud/hosted PBX** (Vertical).
|
||||
- **[2026-06-18 CUTOVER COMPLETE] Voice VLAN (VLAN 30) consolidation:** dedicated isolated **VLAN 30 VOICE (`10.0.30.0/24`, gw `10.0.30.1`, pfSense igc1.30, DHCP `.100-.250`, DNS `8.8.8.8/1.1.1.1`)** holding ALL phones + the Vertical desktop; internet/cloud-PBX egress only, firewalled off VLAN 20 / main LAN / PHI / mgmt (HIPAA). Isolation rules verified via `pfctl -sr` (clone of GUEST VLAN -- the only actually-isolated net). Voice PPSK key on CSCNet -> VOICE: vaulted `clients/cascades-tucson/wifi-voice-ppsk`. **Migration COMPLETE 2026-06-19: 37 devices on VOICE (28 Poly + 8 AudioCodes + Vertical desktop; 1 Poly removed bad 2026-06-19 -- Bistro phone relocated to Kitchen, Bistro replacement pending). Live inventory: `docs/network/voice-phone-inventory.md`:**
|
||||
- Vertical-Remote desktop (port 16): DONE -- `10.0.30.201`. Re-VLANing a wired port requires bouncing the link (port disable/enable via controller API using CSRF token); a UniFi client block/unblock is MAC-filter only, not a link bounce.
|
||||
- **ALL 29 Poly WiFi phones: DONE (2026-06-19)** -- on `10.0.30.202-.223` + `.232-.237`. The 6 stragglers found 2026-06-18 (on VLAN 20 / the .1 net) were identified onsite by Howard + re-keyed to the voice PPSK, plus 2 phones added during the walk. Named per-phone roster in `docs/network/voice-phone-inventory.md` (Zachary Nelson .232, Recreation room .233, Movie Theater .234, Library .235, Bistro .236, John Trozzi rm422 .237, Kitchen server). A phone landing back on the .1 net = it got the regular CSCNet key, not the voice PPSK.
|
||||
- **8 AudioCodes (wired, USW-16-PoE ports 1-8): ALL DONE** -- on `10.0.30.224-.231`. **Gotcha: AudioCodes are externally powered (PoE OFF on those ports), so a UniFi PoE power-cycle AND a controller port disable/enable are both no-ops -- they held their old main-LAN DHCP leases. Required a full physical power-off/on** before they re-DHCP'd onto VOICE.
|
||||
- **Quality caveat + the actual fix (2026-06-19):** the VLAN move does NOT by itself fix call quality. Per-phone re-look found the residual dropped-calls are a **band-selection problem, not RF/coverage** -- several Poly handsets sit on the saturated 2.4 GHz despite EXCELLENT 5 GHz-capable signal (-50 to -60 dBm, 36-96% retry), and controller band-steering (`no2ghz_oui`, already ON) is NOT holding the Poly OUI on 5 GHz. **No controller channel/power/min-rate tuning fixes which band a phone picks.** The fix is phone-side: **set the Poly handsets to 5 GHz-only via Vertical** -- request sent to Richard Turner 2026-06-19 (`docs/network/2026-06-19-vertical-5ghz-lock-request.md`), **awaiting Vertical**. Once pushed: clean voice VLAN + clean 5 GHz band = calls closed out.
|
||||
- **Full runbook:** `clients/cascades-tucson/docs/network/voice-vlan-cutover.md`. Live inventory: `docs/network/voice-phone-inventory.md`. Voice-quality diagnostic: `reports/2026-06-18-voice-quality-diagnostic.md`. Holistic optimization plan: `docs/network/network-optimization-master-plan.md`; voice QoS design: `docs/network/phase1-voice-qos-design.md`.
|
||||
- **[2026-06-19 COMPLETE] Voice VLAN (VLAN 30) consolidation:** dedicated isolated **VLAN 30 VOICE (`10.0.30.0/24`, gw `10.0.30.1`, pfSense igc1.30, DHCP `.100-.250`, DNS `8.8.8.8/1.1.1.1`)** holding ALL phones + the Vertical desktop; internet/cloud-PBX egress only, firewalled off VLAN 20 / main LAN / PHI / mgmt (HIPAA). Voice PPSK key on CSCNet -> VOICE: vaulted `clients/cascades-tucson/wifi-voice-ppsk`. **Migration COMPLETE 2026-06-19: 37 devices on VOICE.** Live inventory: `docs/network/voice-phone-inventory.md`.
|
||||
- **Quality caveat + the actual fix (2026-06-19):** the VLAN move does NOT by itself fix call quality. Per-phone re-look found residual dropped-calls are a **band-selection problem, not RF/coverage** -- several Poly handsets sit on saturated 2.4 GHz despite EXCELLENT 5 GHz-capable signal (-50 to -60 dBm, 36-96% retry), and controller band-steering (`no2ghz_oui`, already ON) is NOT holding the Poly OUI on 5 GHz. **The fix is phone-side: set the Poly handsets to 5 GHz-only via Vertical** -- request sent to Richard Turner 2026-06-19 (`docs/network/2026-06-19-vertical-5ghz-lock-request.md`), **awaiting Vertical**. Once pushed: clean voice VLAN + clean 5 GHz band = calls closed out.
|
||||
- **Full runbook:** `clients/cascades-tucson/docs/network/voice-vlan-cutover.md`. Voice-quality diagnostic: `reports/2026-06-18-voice-quality-diagnostic.md`. Holistic optimization plan: `docs/network/network-optimization-master-plan.md`; voice QoS design: `docs/network/phase1-voice-qos-design.md`.
|
||||
|
||||
### External Vendors & Mail Senders
|
||||
|
||||
@@ -257,7 +254,7 @@ Cascades' line-of-business / reporting SaaS (the systems they pull data OUT of,
|
||||
| **Helpany** (app.safe-living.com) | Caregiver app | Niche -- likely export-only |
|
||||
| **POS** | Point of sale | Product TBD |
|
||||
|
||||
- **[PROPOSED] Unified KPI dashboard (Ashley Jensen request, 2026-06-17):** single dashboard pulling KPIs across the systems above. **Power BI on-prem Gateway is the WRONG frame** (it only bridges Power BI to on-prem sources, never cloud SaaS). Recommended path leans on their existing M365 Business Premium: **Phase 1** scheduled CSV/Excel exports -> SharePoint -> Power BI Pro dashboard on 3-5 KPIs (census/financials); **Phase 2** automate the API-capable systems (Bill.com, QuickBooks Online) via Power Automate. Niche senior-living apps stay on the export method (no ready connectors). Internal scoping: `clients/cascades-tucson/docs/proposals/kpi-dashboard.md`; client one-pager: `.../kpi-dashboard-onepager.md`. Status: parked, awaiting Ashley's day-one KPIs + freshness need + POS/Focus-HR specifics. Check whether ALIS offers a built-in analytics/data feed (could replace plumbing for their top source).
|
||||
- **[PROPOSED] Unified KPI dashboard (Ashley Jensen request, 2026-06-17):** single dashboard pulling KPIs across the systems above. Recommended path: **Phase 1** scheduled CSV/Excel exports -> SharePoint -> Power BI Pro dashboard; **Phase 2** automate the API-capable systems (Bill.com, QuickBooks Online) via Power Automate. Power BI on-prem Gateway is the WRONG frame (bridges only on-prem DBs, not cloud SaaS). Internal scoping: `clients/cascades-tucson/docs/proposals/kpi-dashboard.md`; client one-pager: `.../kpi-dashboard-onepager.md`. Status: parked, awaiting Ashley's day-one KPIs + freshness need + POS/Focus-HR specifics + ALIS analytics availability.
|
||||
|
||||
---
|
||||
|
||||
@@ -267,7 +264,7 @@ Cascades' line-of-business / reporting SaaS (the systems they pull data OUT of,
|
||||
- **CS-SERVER iDRAC:** 192.168.2.65
|
||||
- **pfSense admin (HTTPS):** https://192.168.0.1 -- vault: `clients/cascades-tucson/pfsense-firewall.sops.yaml`
|
||||
- **pfSense SSH:** `ssh admin@192.168.0.1` (system OpenSSH; drops to shell directly, no interactive menu) -- vault admin cred: `clients/cascades-tucson/pfsense-firewall.sops.yaml`; pfsense-ssh.sh (unifi-wifi skill) for scripted access.
|
||||
- **pfSense OpenVPN (Howard):** split-tunnel; vault: `clients/cascades-tucson/pfsense-openvpn-howard.sops.yaml` (user `Howard`; route 192.168.0.0/22). Use OpenVPN GUI or OpenVPN Connect with DCO disabled for stability. Note: Howard-Home is now 10.137.42.0/24 (renumbered 2026-06-16) -- Cascades 192.168.0.x now reachable over the VPN. Server has a configured `--inactive` idle timeout (~300s) that silently drops idle clients -- this is a config setting, not instability.
|
||||
- **pfSense OpenVPN (Howard):** split-tunnel; vault: `clients/cascades-tucson/pfsense-openvpn-howard.sops.yaml` (user `Howard`; route 192.168.0.0/22). Use OpenVPN GUI or OpenVPN Connect with DCO disabled for stability. Howard-Home is 10.137.42.0/24 (renumbered 2026-06-16). Server has a configured `--inactive` idle timeout (~300s) that silently drops idle clients.
|
||||
- **pfSense config backup (2026-06-17):** `clients/cascades-tucson/pfsense-config-backup-2026-06-17.sops.yaml`
|
||||
- **Synology DSM:** http://192.168.0.120:5000 -- vault: `clients/cascades-tucson/synology-cascadesds.sops.yaml` (admin). Drive Server port 6690 (SSL). **[SECURITY] Synology Cloud Signin Portal credential (`clients/cascades-tucson/synology-signin-portal.sops.yaml`) was committed plaintext at vault commit 1fbc0e1 -- exposed in git history; encrypted go-forward but credential should be rotated.**
|
||||
- **M365 admin:** admin@cascadestucson.com -- vault: `clients/cascades-tucson/m365-admin.sops.yaml`
|
||||
@@ -300,28 +297,28 @@ Cascades' line-of-business / reporting SaaS (the systems they pull data OUT of,
|
||||
|
||||
### Exchange Online / Message Tracing
|
||||
|
||||
- **Get-MessageTrace is hard-deprecated (Sept 2025).** Use `Get-MessageTraceV2` instead. Key parameter change: use `ResultSize` (not `PageSize`). The deprecation error may be silently swallowed by downstream jq filters -- if a trace returns unexpectedly empty, check the raw response for a deprecation error string before assuming no mail. Source: 2026-06-04 Chris Knight investigation.
|
||||
- **Sender-side suppression (SendGrid ESP):** If a user never receives mail from a specific sender despite a healthy mailbox, and message trace shows zero records (not even bounces), consider a SendGrid suppression list. Fix requires contacting the sender's support to clear the suppression -- there is no M365 action that can resolve this. Confirmed with bill.com / inform.bill.com.
|
||||
- **Get-MessageTrace is hard-deprecated (Sept 2025).** Use `Get-MessageTraceV2` instead. Key parameter change: use `ResultSize` (not `PageSize`). The deprecation error may be silently swallowed by downstream jq filters -- if a trace returns unexpectedly empty, check the raw response for a deprecation error string before assuming no mail.
|
||||
- **Sender-side suppression (SendGrid ESP):** If a user never receives mail from a specific sender despite a healthy mailbox, and message trace shows zero records (not even bounces), consider a SendGrid suppression list. Fix requires contacting the sender's support -- there is no M365 action. Confirmed with bill.com / inform.bill.com.
|
||||
|
||||
### Active Directory / User Management
|
||||
|
||||
- **Security group assignment is always explicit.** When creating or adding any Cascades user, always ask which security group(s). OU -> group auto-mirror was explicitly declined 2026-05-14. Source: `feedback_cascades_user_security_group.md`.
|
||||
- **Security group assignment is always explicit.** When creating or adding any Cascades user, always ask which security group(s). OU -> group auto-mirror was explicitly declined 2026-05-14.
|
||||
|
||||
- **New user mandatory order (folder redirection):**
|
||||
1. Create AD user
|
||||
2. Run `New-HomeFolder -Username "<sam>"` on CS-SERVER (creates root + Desktop/Documents/Downloads/Music/Pictures with correct ACL)
|
||||
3. Add to SG-FolderRedirect
|
||||
4. THEN first domain logon
|
||||
- Skipping step 2 causes fdeploy to cache a failure silently and never retry. Source: `feedback_cascades_folder_redirect.md`.
|
||||
- Skipping step 2 causes fdeploy to cache a failure silently and never retry.
|
||||
|
||||
- **Folder redirect recovery:** If fdeploy cached a failure ("No changes detected"), run `clients/cascades-tucson/scripts/fix-shell-redirect.ps1` via GuruRMM while user is logged in. Must set both GUID-based and legacy-name registry keys. Folders must already exist on server.
|
||||
|
||||
- **fdeploy1.ini flags:** Changed from `Flags=1211` (included `Grant Exclusive Rights` bit 0x400, causing WRITE_DAC failures on new subfolders) to `Flags=187`. File at `{512B43A4-F049-4CE5-BFAC-860AD13E92BE}\User\Documents & Settings\fdeploy1.ini` on CS-SERVER.
|
||||
- **fdeploy1.ini flags:** Changed from `Flags=1211` (included `Grant Exclusive Rights` bit 0x400, causing WRITE_DAC failures on new subfolders) to `Flags=187`.
|
||||
|
||||
- **[ROOT CAUSE + FIX 2026-06-08] Native Folder Redirection was DOA on every machine -- the config file was MISNAMED.** Every Cascades machine had needed the manual `fix-shell-redirect.ps1` registry workaround because native FR never worked. Root cause: the redirect targets in GPO `CSC - Folder Redirection` (`{512B43A4-...}`) were saved in a file named **`fdeploy1.ini`**, but the Windows Folder Redirection client-side extension only ever reads **`fdeploy.ini`**. The file was hand-built by editing `fdeploy1.ini` (the wrong filename). **Fix:** wrote a correct `fdeploy.ini` (5 folders, `Flags=187`, `FullPath=\\CS-SERVER\Homes\%USERNAME%\<Folder>`) into `{512B43A4-...}\User\Documents & Settings\`, bumped the GPO version 917506->983042 (GPT.INI **and** AD `versionNumber` kept in sync). **Native FR now redirects all 5 folders on first logon -- the registry workaround should no longer be needed for new users.**
|
||||
- **LE GPO also broken:** `CSC - Folder Redirection (LE)` (`{889BE7BE-...}`, linked at OU=Life Enrichment) has a **completely empty `\User` tree**. Sharon Edwards / Susan Hicks have likewise only ever worked via the registry workaround. Follow-up: retire the LE GPO and put LE users into `SG-FolderRedirect`, or apply the same `fdeploy.ini` fix to the LE GPO. Sharon/Susan are NOT currently in `SG-FolderRedirect` -- add them before relying on inheritance.
|
||||
- **[ROOT CAUSE + FIX 2026-06-08] Native Folder Redirection was DOA on every machine -- the config file was MISNAMED.** Root cause: the redirect targets in GPO `CSC - Folder Redirection` (`{512B43A4-...}`) were saved in a file named **`fdeploy1.ini`**, but the Windows Folder Redirection CSE only ever reads **`fdeploy.ini`**. **Fix:** wrote a correct `fdeploy.ini` (5 folders, `Flags=187`, `FullPath=\\CS-SERVER\Homes\%USERNAME%\<Folder>`) into `{512B43A4-...}\User\Documents & Settings\`, bumped the GPO version 917506->983042 (GPT.INI **and** AD `versionNumber` kept in sync). **Native FR now redirects all 5 folders on first logon.**
|
||||
- **LE GPO also broken:** `CSC - Folder Redirection (LE)` (`{889BE7BE-...}`, linked at OU=Life Enrichment) has a **completely empty `\User` tree**. Sharon Edwards / Susan Hicks have likewise only ever worked via the registry workaround. Follow-up: retire the LE GPO and put LE users into `SG-FolderRedirect`, or apply the same `fdeploy.ini` fix to the LE GPO.
|
||||
|
||||
- **Login-screen hide (SpecialAccounts\UserList):** An enabled local admin that does not appear in the Windows sign-in picker is a `SpecialAccounts\UserList` suppression, not a disabled account. Registry path: `HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon\SpecialAccounts\UserList`, value `<username>=0`. Fix: delete the DWORD value; account reappears after sign-out/reboot. Confirmed on NURSESTATION-PC 2026-06-05 -- `localadmin=0` removed; account was already enabled and in Administrators.
|
||||
- **Login-screen hide (SpecialAccounts\UserList):** An enabled local admin that does not appear in the Windows sign-in picker is a `SpecialAccounts\UserList` suppression, not a disabled account. Registry path: `HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon\SpecialAccounts\UserList`, value `<username>=0`. Fix: delete the DWORD value.
|
||||
|
||||
### File Shares & Scan-to-Folder (Accounting)
|
||||
|
||||
@@ -329,24 +326,24 @@ Cascades' line-of-business / reporting SaaS (the systems they pull data OUT of,
|
||||
- `D:\Shares\Accounting` on CS-SERVER -- inheritance broken; SYSTEM / BUILTIN\Administrators = Full; `lauren.hasselman`, `chris.knight`, `zachary.nelson` = Modify (no Everyone). Shared as **`\\CS-SERVER\AcctDept`** (Change: those 3 users + `svc-scan`; Full: Admins).
|
||||
- **Share is named `AcctDept`, NOT `Accounting`** -- a *printer* share named `Accounting` (Canon MF455DW, `LocalsplOnly`) already exists. Do not collide with it.
|
||||
- **`svc-scan`** = dedicated AD service account (CN=Users, PasswordNeverExpires, CannotChangePassword) for the Brother's SMB auth. Vault: `clients/cascades-tucson/svc-scan.sops.yaml`.
|
||||
- **REUSE `svc-scan` for EVERY future scanner->network-folder setup at Cascades** (Howard, 2026-06-09) -- do NOT create a per-printer/per-folder scan account. For a new scan destination: grant `CASCADES\svc-scan` Modify on the new scan folder, then enter `cascades\svc-scan` + the vaulted password (NTLMv2) in that scanner's Scan-to-Network profile.
|
||||
- **REUSE `svc-scan` for EVERY future scanner->network-folder setup at Cascades** (Howard, 2026-06-09) -- do NOT create a per-printer/per-folder scan account.
|
||||
- **Brother MFC-L8900CDW "Business Office" printer (10.0.20.220) -- Scan-to-Network profile (working 2026-06-09):** Network Folder Path `\\192.168.2.254\AcctDept\Scans`; **Auth Method NTLMv2** (not Auto/Kerberos -- printer can't KDC across VLAN); Username `cascades\svc-scan`; PDF Multi-Page.
|
||||
- **[NETWORK] CS-SERVER cannot reach the VLAN-20 printers** -- main-LAN `192.168.2.x` -> VLAN 20 `10.0.20.x` is blocked at pfSense. Use a VLAN-20 PC's browser (e.g. ACCT2-PC `10.0.20.209`) or go onsite. The reverse (printer -> CS-SERVER:445) **is** open.
|
||||
- **Persistent drive maps to `\\cs-server\AcctDept`:** Chris (DESKTOP-N5G1ROO) Y:, Zachary (ACCT2-PC) Y:, Lauren (DESKTOP-H6QHRR7) X: (Y: was already in use on hers).
|
||||
- **[NETWORK] CS-SERVER cannot reach the VLAN-20 printers** -- main-LAN `192.168.2.x` -> VLAN 20 `10.0.20.x` is blocked at pfSense. Use a VLAN-20 PC's browser or go onsite. The reverse (printer -> CS-SERVER:445) **is** open.
|
||||
- **Persistent drive maps to `\\cs-server\AcctDept`:** Chris (DESKTOP-N5G1ROO) Y:, Zachary (ACCT2-PC) Y:, Lauren (DESKTOP-H6QHRR7) X:.
|
||||
|
||||
### Synology NAS (cascadesDS) / Shared File Access
|
||||
|
||||
- **Stale Word owner (lock) files on cascadesDS shares:** Word creates a hidden `~$<truncated filename>` owner file when a document is opened; if the user's session ends without cleanly closing Word, the `~$` file is orphaned. **Fix:** delete the `~$` file(s). Confirmed 2026-06-10: five `~$` files dated 2024 on `\\cascadesds\Public\Company Web Docs\Staff Trainings\` caused false lock messages.
|
||||
- **Accessing cascadesDS from RMM -- always use a user session, not CS-SERVER SYSTEM.** The domain-joined CS-SERVER machine account cannot authenticate to the Synology `Public` share because cascadesDS uses workgroup "CASCADES" (same short name as the AD domain), causing Kerberos auth failures. Workaround: run the command in the `user_session` context of a machine where the target user is actively logged in (e.g. ASSISTMAN-PC agent `cf86fa5e` for Meredith-accessible shares).
|
||||
- **Synology Drive sync scope (as of 2026-06-18):** The Drive Client on CS-SERVER syncs only the **Sync DSM user's My Drive** (`/volume1/homes/Sync/Drive/`) into `D:\Shares\Main` -- one-way download (mode:1). The real department shared folders (`/volume1/Server`, `/volume1/Management`, `/volume1/Public`, `/volume1/SalesDept`, etc.) are **NOT** in this scope and are NOT currently mirrored to CS-SERVER. These require a separate Team Folder setup. Note: `synopkg status SynologyDrive` falsely returns "stopped" (status 263) even when the service is active -- verify via `systemctl is-active pkgctl-SynologyDrive` and `netstat -tlnp | grep 6690` instead.
|
||||
- **Stale Word owner (lock) files on cascadesDS shares:** Word creates a hidden `~$<truncated filename>` owner file when a document is opened; orphaned on abrupt session end. **Fix:** delete the `~$` file(s). Confirmed 2026-06-10.
|
||||
- **Accessing cascadesDS from RMM -- always use a user session, not CS-SERVER SYSTEM.** The domain-joined CS-SERVER machine account cannot authenticate to the Synology `Public` share because cascadesDS uses workgroup "CASCADES" (same short name as the AD domain), causing Kerberos auth failures. Run the command in `user_session` context of a machine where the target user is actively logged in.
|
||||
- **Synology Drive sync scope (as of 2026-06-18):** The Drive Client on CS-SERVER syncs only the **Sync DSM user's My Drive** (`/volume1/homes/Sync/Drive/`) into `D:\Shares\Main` -- one-way download. The real department shared folders (`/volume1/Server`, `/volume1/Management`, `/volume1/Public`, `/volume1/SalesDept`, etc.) are **NOT** in this scope. Note: `synopkg status SynologyDrive` falsely returns "stopped" (status 263) even when active -- verify via `systemctl is-active pkgctl-SynologyDrive` and `netstat -tlnp | grep 6690`.
|
||||
|
||||
### Browser / Edge
|
||||
|
||||
- **[BUG - FLEET] Edge 149 cannot open Office files via download-list when Downloads is a UNC-redirected folder (Chromium issue 519243472).** A regression introduced in Chromium 149 (feature `LaunchShellExecuteViaExplorer`) prepends `\\?\` to UNC paths without converting to the correct `\\?\UNC\` form, producing a malformed path. **Symptom:** clicking an `.xlsx` or `.docx` in the Edge download panel shows "Windows cannot find '\\?\\\cs-server\...'" Text files and PDFs open fine. The same Office file double-clicked from File Explorer opens normally. **Trigger:** Downloads folder redirected via GPO Folder Redirection to a UNC path with no mapped drive letter -- exactly Cascades' Homes-share redirect configuration. **Affected build:** Edge stable 149.0.4022.52. **Fix options (none applied as of 2026-06-08):** (1) Update Edge past the fix; (2) Interim: `--disable-features=LaunchShellExecuteViaExplorer`; (3) Zero-config: use "Show in folder" then double-click from Explorer; (4) Supported 149->148 rollback. Note: pinning to 148 forfeits security fixes; prefer option 1 or 3 for HIPAA machines.
|
||||
- **[BUG - FLEET] Edge 149 cannot open Office files via download-list when Downloads is a UNC-redirected folder (Chromium issue 519243472).** A regression introduced in Chromium 149 prepends `\\?\` to UNC paths without converting to the correct `\\?\UNC\` form. **Symptom:** clicking `.xlsx` or `.docx` in the Edge download panel shows "Windows cannot find '\\?\\\cs-server\...'". Text files and PDFs open fine. **Trigger:** Downloads folder redirected via GPO Folder Redirection to a UNC path. **Affected build:** Edge stable 149.0.4022.52. **Fix options (none applied as of 2026-06-08):** (1) Update Edge past the fix; (2) Interim: `--disable-features=LaunchShellExecuteViaExplorer`; (3) Zero-config: use "Show in folder" then double-click from Explorer; (4) Rollback to 148. Note: pinning to 148 forfeits security fixes; prefer option 1 or 3 for HIPAA machines.
|
||||
|
||||
### Conditional Access / Caregiver Policies
|
||||
|
||||
- **Phased rollout -- never tenant-wide.** CA policies for caregivers now target `SG-Caregivers` (`8b8d9222-5d71-419a-936d-56d895c6c332`) (Entra Connect exited staging 2026-05-14; SG-Caregivers-Pilot superseded). The legacy "Require MFA for all users" policy stays in place. Source: `project_cascades_ca_phased_rollout.md`.
|
||||
- **Phased rollout -- never tenant-wide.** CA policies for caregivers now target `SG-Caregivers` (`8b8d9222-5d71-419a-936d-56d895c6c332`). The legacy "Require MFA for all users" policy stays in place.
|
||||
- **Enforced caregiver CA policy set (unchanged as of 2026-06-03):**
|
||||
- `CSC - Block caregivers off Cascades network` (`e35614e1-e896-4a13-9407-076963af488f`) -- BLOCK if location not Cascades
|
||||
- `CSC - Block caregivers on non-compliant device` (`ede985e2-ee7e-4521-88b2-34c847c3db20`) -- BLOCK if device non-compliant. **Pending DISABLE** at allow-list cutover.
|
||||
@@ -354,7 +351,7 @@ Cascades' line-of-business / reporting SaaS (the systems they pull data OUT of,
|
||||
- **Caregiver device allow-list (2026-06-03 -- report-only):** `CSC - Caregivers: allow-listed devices only (REPORT-ONLY)` -- id `1b7fd025-1aad-47c8-9274-c32c3e0b163c`; state `enabledForReportingButNotEnforced`. Device filter (mode `exclude`): `(device.displayName -startsWith "CSC-") -or (device.extensionAttribute1 -eq "CSCCaregiverDevice")`. Includes: NURSESTATION-PC (deviceId `d3bf931f`), Laptop2, LAPTOP-DRQ5L558, LAPTOP-E0STJJE8, LAPTOP-8P7HDSEI, ASSISTNURSE-PC (needs re-join + re-tag after Win11 reinstall).
|
||||
- **GDAP exclusion:** CA policy 3 must exclude "Service provider users" (GDAP foreign principals) + `SG-External-Signin-Allowed` + `SG-Break-Glass`, otherwise ACG partner admins lose access at CA cutover.
|
||||
- **Known bug:** `Require MFA for all users` policy (`7e87a1c7...`) excludes `SG-Caregivers-Pilot` instead of the live `SG-Caregivers` (`8b8d9222`). Functionally harmless today (pilot group still exists), but must be corrected.
|
||||
- **Pilot cleanup required when done:** Delete `pilot.test@cascadestucson.com`, clean up `howard.enos@cascadestucson.com`, remove `SG-Caregivers-Pilot` from CA policy targets and delete the group. Source: `project_cascades_pilot_cleanup.md`.
|
||||
- **Pilot cleanup required when done:** Delete `pilot.test@cascadestucson.com`, clean up `howard.enos@cascadestucson.com`, remove `SG-Caregivers-Pilot` from CA policy targets and delete the group.
|
||||
|
||||
### EXO / Message Trace
|
||||
|
||||
@@ -363,101 +360,64 @@ Cascades' line-of-business / reporting SaaS (the systems they pull data OUT of,
|
||||
|
||||
### Wireless / UniFi RF
|
||||
|
||||
- **[EXECUTED 2026-06-19 -- autonomous 2 AM window, validated] First production RF optimization applied + kept:**
|
||||
- **2.4 power Low/full -> MEDIUM on 47 radios** (the 42 over-thinned `low` floors 1-4 + named, + the 5
|
||||
MemCare floors-5/6 `auto`/full radios 505/517/608/615/622). The 24 thinned-disabled radios stayed
|
||||
disabled; 5 mesh-auto APs untouched. Non-regressive (satisfaction held). Undid the over-thinning
|
||||
regression + brought MemCare off full power. **Per-AP targeting required** -- `apply-radio power --zone`
|
||||
re-enables disabled radios (re-confirmed gotcha).
|
||||
- **5 GHz -> clean DFS 40 MHz channels** on 72 non-mesh APs (channels 52/60/100/108/116/124/132/140),
|
||||
0 co-channel, mesh excluded (2nd Floor Atrium + children CC Bridge/salon/108 left on auto). **Result:
|
||||
5 GHz retry roughly HALVED -- 8.7 -> 3.8 avg, median 8.2 -> 2.1.** Validated; all 72 APs holding DFS,
|
||||
0 radar vacates. Voice nudged back to 5 GHz (kick-sta) after the channel-change scatter.
|
||||
- **CSCNet BSS-transition (802.11v) ON.** 6 GHz still BLOCKED (WPA3 -- see below).
|
||||
- **[BIG LESSON -- non-DFS decision REVERSED by data]** A blind non-DFS reshuffle was tried first and
|
||||
FAILED (flat retry); the completed channel survey (74/74 APs) proved **DFS channels here are 4-5x
|
||||
cleaner (2-3% busy) than non-DFS (ch149=12%, ch157=28%, ch44=22% -- the property's worst).** Consumer/
|
||||
neighbor gear avoids DFS. Choosing channels from the measured scan (not a non-DFS policy) is what
|
||||
delivered the win. **Always: scan -> `survey-report.py` -> `channel-plan --channels` -> apply -> validate.**
|
||||
- **Fleet (full audit 2026-06-16):** 77 U7-Pro APs, **12 switches**, ~587 wireless clients. Controller: UOS at 172.16.3.29, HTTPS 11443 (see [[uos-server]]); site short name `va6iba3v`, site_id `685f39068e65331c46ef6dd2`. No UniFi gateway (pfSense is the gateway). pfSense ruled out as WiFi factor 2026-06-16 (DHCP not exhausted, DNS up, WAN stable -- see Network section).
|
||||
- **Primary pain band is 2.4 GHz.** Avg TX-retry ~10%; cu_total 69-94% live; catastrophic neighbor BSSID density (ch6 ~33k BSSIDs, ch1 ~19k, ch11 ~17k). 27 of the 40 worst clients stuck on 2.4 GHz (retry 11-42%), mostly IoT/legacy hardware (Ring cameras, robotic cleaner, smart plugs, EPSON printer, Poly phone, handheld scanners, smartwatch). Root cause: ~75 2.4 GHz radios running at auto (full) TX power in extreme density. Experience splits by band: 5/6 GHz clients are fine; clients that land or stick on 2.4 GHz suffer.
|
||||
- **5 GHz -- DFS concern is theoretical; empirically clean.** 76/77 radios on 80 MHz width (should be 40 MHz at this density). 55/77 radios on DFS channels (52-144) near Davis-Monthan AFB + TUS airport radar. `dfs-check.sh` 2026-06-16: **ZERO real radar events fleet-wide** (55 DFS APs, full `dmesg` sweep, precise pattern match) -- DFS is empirically low-risk here. Measured TX-retry DFS (8.4%) ~= non-DFS (9.0%) -- no throughput penalty. Still recommended to move to non-DFS (UNII-1 36-48 + UNII-3 149-161) for resilience. NOTE: an earlier mid-session claim (2026-06-15 audit) that "DFS was the #1 problem" was an artifact of tooling bugs (raw counter + 15-AP head cap) and was corrected before session end -- do not repeat it.
|
||||
- **6 GHz is nearly unused -- root cause: CSCNet not broadcasting 6 GHz** (`wlan_bands=[2g,5g]`, found 2026-06-18). 75 radios active but only ~1 client because the band is dark at the SSID level. Largest untapped, clean, non-DFS capacity -- enabling 6 GHz on CSCNet (`apply-wlan bands all` + `bsstm on`) is the **relief valve** and must precede 5 GHz width-narrowing. The Poly voice phones are 5 GHz (not 6E), so 6 GHz helps voice indirectly by clearing 5 GHz of resident devices.
|
||||
- **AP 103 saturated (5 GHz):** ch149, ~75% airtime, ~25,900 retries, 12 clients. Lauren's voice phone (`.202`) locked here 2026-06-18 (off the CC Bridge mesh AP) -> AP 103 must be relieved (off ch149 / 80->40 MHz / load-balance) before/with that lock or she trades a mesh problem for congestion.
|
||||
- **Switch audit (2026-06-16):** ~25 ports linked at 100 Mbps but gig-capable (systematic cabling/NIC issue, 1st/2nd/3rd-floor switches; investigate after WiFi Phase A). PoE budgets healthy. 3 offline switches: Switch 2nd Floor #2, Switch 4th Floor #2, USW Pro Max 16. Port p38 (1st Floor USW) 4.0% tx-drop rate.
|
||||
- **AP-level satisfaction 95-100 fleet-wide.** Network is healthy on average; pain is in the client tail.
|
||||
- **Remediation status (as of 2026-06-17 -- OVER-THINNED):**
|
||||
- **Phase A (2.4 power-down): EXTENDED + OVER-THINNED.** Floor-4 pilot (2026-06-16): 14/15 radios to 6 dBm, retry 13.2->9.5%, no coverage loss. Subsequently (overnight 2026-06-17): 24 of 76 2.4 radios DISABLED + 42 set to Low (~6 dBm); Floors 5/6 + mesh untouched. Results DEGRADED: retry 17->23.4%, satisfaction 39->30. **Current action needed: Low->Medium for the 42 at-Low radios** (Phase 0 + Phase 1 per the combined radio_table PUT plan, pending go-ahead).
|
||||
- **Phase C (disable 9 redundant 2.4 radios): NOT applied.** Data-backed disable list (each has >=2 active-2.4 SNR neighbors): 127->128, 229->128, 248->348, 330->128, 445->347/348/247, 428->128, 622->505/615/608, Kitchen->Memcare TV room, Dining Room->memcare piano. Excludes mesh-protected APs (2nd Floor Atrium, CC Bridge, salon, 206 U7 Pro) and Memcare TV room. APs 445/428 disables held pending further validation.
|
||||
- **5 GHz -- auto-channel made things WORSE.** Auto-channel reassignment applied via UniFi (Howard, 2026-06-17): co-channel pairs 25->30. **Option B** (combined 40MHz + non-DFS channel plan + min-RSSI -82 relax via per-AP radio_table PUT) is the plan. Dry-run clean; NOT applied (pending go-ahead + evening window).
|
||||
- **Deferred levers (separate session):** min-data-rate raise (1->12 Mbps), band-steering (`apply-wlan bandsteer`), 2.4 min-RSSI on the 6 OFF APs (615, 608, 505, 517, 622, salon), 6 GHz band-steering.
|
||||
- **Poly phone drops (2026-06-17) -- CLOSED.** Root cause = intentional pfSense reboot on 2026-06-16 22:38:12 MST (one fleet-wide event; 28/30 phones each dropped once, all floors, including floors 5/6 untouched by radio work). Only a gateway-level event explains all-floors-at-once. Today's data (2026-06-17) back to ~99.77%. NOT a WiFi or DHCP issue.
|
||||
- **DHCP is healthy.** pfSense dhcpd.log: 1241 ACK / 1 NAK / 0 no-free-leases (verified via direct tail/grep -- NOT clog). Per-room /28 HIPAA segmentation is intentional (fullest 12/13); do NOT flatten. `sta_dhcp_failures` metric is client/WiFi-side (frames lost at 100% retry), not pfSense-side.
|
||||
- **Config flags:** 6 APs with 2.4 min-RSSI OFF (615, 608, 505, 517, 622, salon); 4 APs off the 1/6/11 plan (128 disabled, 108 offline, 108U7 Pro auto, salon auto).
|
||||
- **[APPLIED 2026-06-19 -- validated] Production RF optimization applied + kept:**
|
||||
- **2.4 power -> MEDIUM on 47 radios** (42 over-thinned-`low` floors 1-4 + 5 MemCare `auto`/full radios 505/517/608/615/622). The 24 thinned-disabled radios stayed disabled; 5 mesh-auto APs untouched. Non-regressive. Per-AP targeting required -- `apply-radio power --zone` re-enables disabled radios (confirmed gotcha).
|
||||
- **5 GHz -> clean DFS 40 MHz channels** on 72 non-mesh APs (channels 52/60/100/108/116/124/132/140), 0 co-channel, mesh excluded. **Result: 5 GHz retry roughly HALVED -- 8.7 -> 3.8 avg, median 8.2 -> 2.1.** Validated; all 72 APs holding DFS, 0 radar vacates. Voice nudged back to 5 GHz (kick-sta).
|
||||
- **CSCNet BSS-transition (802.11v) ON.** 6 GHz BLOCKED (WPA3 mandate on WPA2/PPSK SSID -- deferred to Howard).
|
||||
- **[KEY DATA -- DFS decision reversed] Full channel survey (74/74 APs) proved DFS channels here are 4-5x cleaner (2-3% busy) than non-DFS (ch149=12%, ch157=28%, ch44=22% -- the property's worst channels).** Consumer/neighbor gear avoids DFS; choosing channels from measured scan (not a non-DFS policy) delivered the win. **Always: scan -> `survey-report.py` -> `channel-plan --channels` -> apply -> validate.** Do NOT use UniFi auto-channel; do NOT apply a non-DFS-only policy without checking the survey data for this site.
|
||||
- **Fleet (full audit 2026-06-16):** 77 U7-Pro APs, **12 switches**, ~587 wireless clients. Controller: UOS at 172.16.3.29, HTTPS 11443; site short name `va6iba3v`. No UniFi gateway (pfSense is the gateway).
|
||||
- **Primary pain band is 2.4 GHz.** Avg TX-retry ~10%; cu_total 69-94% live; catastrophic external neighbor BSSID density (ch6 ~33k BSSIDs, ch1 ~19k, ch11 ~17k). 27 of the 40 worst clients stuck on 2.4 GHz, mostly IoT/legacy hardware. Experience splits by band: 5/6 GHz clients are fine; clients that land or stick on 2.4 GHz suffer.
|
||||
- **6 GHz is nearly unused -- root cause: CSCNet not broadcasting 6 GHz** (`wlan_bands=[2g,5g]`, found 2026-06-18). 75 radios active but only ~1 client because the band is dark at the SSID level. Largest untapped, clean, non-DFS capacity. Enabling requires WPA3+PMF conversion on the 427-client SSID -- Howard's supervised decision.
|
||||
- **Poly phone drops (2026-06-17) -- CLOSED.** Root cause = intentional pfSense reboot on 2026-06-16 22:38:12 MST (one fleet-wide event; 28/30 phones each dropped once). Only a gateway-level event explains all-floors-at-once.
|
||||
- **DHCP is healthy.** pfSense dhcpd.log: 1241 ACK / 1 NAK / 0 no-free-leases. Per-room /28 HIPAA segmentation is intentional; do NOT flatten. `sta_dhcp_failures` metric is client/WiFi-side, not pfSense-side.
|
||||
- **Switch audit (2026-06-16):** ~25 ports linked at 100 Mbps but gig-capable (systematic cabling/NIC issue, 1st/2nd/3rd-floor switches; investigate after WiFi Phase A). 3 offline switches: Switch 2nd Floor #2 (reset+re-adopted 2026-06-17), Switch 4th Floor #2, USW Pro Max 16. Port p38 (1st Floor USW) 4.0% tx-drop rate.
|
||||
- **Mesh topology:** 2nd Floor Atrium is wireless-mesh parent for CC Bridge + salon (5 GHz backhaul ch36); 206 U7 Pro carries AP 108. These must NEVER be disabled or powered down via zone command -- coverage-thin auto-excludes them.
|
||||
- **Known hardware:** AP 108 (Floor 1) offline pending a new cable run (expected). Stale duplicate controller object ("108" vs "108U7 Pro") to clean up separately.
|
||||
- **AP-hang recovery:** use `device-control.sh cascades poe-cycle "<AP name>" --apply` (remote PoE port cycle via controller cmd/devmgr). Do NOT use `force-provision` -- it took AP 445 offline during the Floor-4 pilot and was removed from device-control.sh.
|
||||
- **Tooling (`unifi-wifi` skill -- feature-complete as of 2026-06-16):**
|
||||
- Collectors: `audit-site.sh` (config + neighbor density), `live-stats.sh` (live per-AP/client, Plane 2), `model-rank.sh`, `radio-usage.sh` (77-day 2.4 usage history per AP; confirms POWER-DOWN vs disable), `coverage-thin.sh` (mesh-aware 2.4 SNR dominating-set -- drives Phase C), `neighbor-collect.sh` (/proc/ui_neighbor AP-to-AP SNR matrix, non-disruptive, drives optimize-radios disables), `survey-collect.sh` (per-channel busy%/noise -> channel plan), `dfs-check.sh` (precise per-AP radar event history), `switch-audit.sh`, `gw-audit.sh`, `monitor-run.sh` (cron health digest, all sites), `sites.sh` (multi-client site list, ~49 UOS sites).
|
||||
- **`survey-report.py` (NEW 2026-06-19) -- the channel-decision driver:** rolls the `survey-collect` JSON into the fleet per-channel/per-band-group measured busy% table + cleanest/dirtiest ranking + a suggested clean 40MHz palette. Run it BEFORE any channel change; it's what makes the DFS-vs-non-DFS call from facts (the skill previously had a non-DFS bias baked into `survey-collect`'s report AND `channel-plan`'s palette -- both fixed 2026-06-19).
|
||||
- Apply (gated + rollback): `apply-radio.sh` (power/width/channel/minrssi/disable/enable, --zone/--ap), `apply-wlan.sh` (minrate/bandsteer/bands/steer/bsstm/dtim/isolation/etc.), `client-control.sh` (block/unblock/kick MAC -- used to nudge sticky phones off 2.4 after a channel change), `device-control.sh` (poe-cycle; adopt/restart/locate/upgrade), **`channel-plan.sh` (now DATA-DRIVEN palette: `--channels <list>` or `--dfs ok|avoid|only`; default ranks ALL 40MHz primaries by measured busy%; load-balance + local-search -> 0 strong co-channel).**
|
||||
- pfSense: `pfsense-ssh.sh` (audit/dhcp/run -- SSH backend, no RESTAPI package needed; auth from `clients/<slug>/pfsense-firewall`; system OpenSSH via askpass). ROADMAP: gated control verbs (firewall rules, port forwards) -- deferred to Mike per SS E.
|
||||
- All scripts site-parameterized (work for any of ~49 UOS sites). Per-client AP-side creds via `clients/<slug>/unifi-ap-ssh`.
|
||||
- **Creds (vault refs only):** `infrastructure/uos-server-ssh-key` (SSH/Mongo), `infrastructure/uos-server-network-api-rw` (RW API), `clients/cascades-tucson/unifi-ap-ssh` (per-AP SSH, needs site VPN for L3 reach to 192.168.2.x/3.x), `clients/cascades-tucson/pfsense-firewall` (pfSense admin for pfsense-ssh.sh).
|
||||
- **Prior diagnostic (2026-05-16):** cloud API only, read-only; identified 2.4 GHz saturation hypothesis. Controller access was blocked at the time. Live controller access gained 2026-06-15 when Mike vaulted the SSH key and RW admin.
|
||||
- **Tooling note:** `live-stats.sh` accuracy bugs fixed 2026-06-15 (removed 15-AP head cap, switched satisfaction to device-level, switched TX-retries to `tx_retries_pct` rate field, sorted worst-client list by satisfaction). These bugs caused a mid-session misdiagnosis that was corrected before session end.
|
||||
- **AP-hang recovery:** use `device-control.sh cascades poe-cycle "<AP name>" --apply`. Do NOT use `force-provision` -- it took AP 445 offline during the Floor-4 pilot.
|
||||
- **Tooling (`unifi-wifi` skill -- feature-complete as of 2026-06-19):**
|
||||
- Collectors: `audit-site.sh`, `live-stats.sh`, `model-rank.sh`, `radio-usage.sh`, `coverage-thin.sh`, `neighbor-collect.sh`, `survey-collect.sh`, `dfs-check.sh`, `switch-audit.sh`, `gw-audit.sh`, `monitor-run.sh`, `sites.sh`.
|
||||
- **`survey-report.py` (NEW 2026-06-19) -- the channel-decision driver:** rolls `survey-collect` JSON into the fleet per-channel/per-band-group measured busy% table + cleanest/dirtiest ranking + suggested clean 40 MHz palette. Run it BEFORE any channel change; it's what makes the DFS-vs-non-DFS call from facts. Previously `survey-collect`'s report AND `channel-plan`'s palette had a non-DFS bias baked in -- both fixed 2026-06-19.
|
||||
- Apply (gated + rollback): `apply-radio.sh` (power/width/channel/minrssi/disable/enable, --zone/--ap), `apply-wlan.sh` (minrate/bandsteer/bands/steer/bsstm/dtim/isolation/etc.), `client-control.sh` (block/unblock/kick MAC), `device-control.sh` (poe-cycle; adopt/restart/locate/upgrade), **`channel-plan.sh` (DATA-DRIVEN: `--channels <list>` or `--dfs ok|avoid|only`; default ranks ALL 40 MHz primaries by measured busy%; load-balance + local-search -> 0 strong co-channel).**
|
||||
- pfSense: `pfsense-ssh.sh` (audit/dhcp/run -- SSH backend, no RESTAPI package needed).
|
||||
- **Creds (vault refs only):** `infrastructure/uos-server-ssh-key` (SSH/Mongo), `infrastructure/uos-server-network-api-rw` (RW API), `clients/cascades-tucson/unifi-ap-ssh` (per-AP SSH, needs site VPN), `clients/cascades-tucson/pfsense-firewall` (pfSense admin for pfsense-ssh.sh).
|
||||
|
||||
### VoIP / Network Device Migration
|
||||
|
||||
- **Re-VLANing a wired switch port requires a link bounce to force re-DHCP.** Changing the native VLAN on a UniFi switch port does not reset the NIC link; the device holds its old DHCP lease (renewal unicast to the old DHCP server is blocked by the new VLAN's firewall rules). Fix: bounce the port (PoE power-cycle for PoE devices; disable/enable via controller API for non-PoE). A UniFi client block/unblock is a MAC-address filter only -- it does NOT bounce the link. Controller API port-bounce requires the `X-CSRF-Token` from the login response header (`x-updated-csrf-token`). Confirmed on the Vertical-Remote desktop (2026-06-17).
|
||||
- **Externally-powered devices (AudioCodes desk phones) need a PHYSICAL power-cycle, not a controller bounce.** The 8 AudioCodes sit on USW-16-PoE ports 1-8 but run on **external power bricks (PoE OFF on those ports)** -- so a UniFi PoE power-cycle is a no-op AND a controller port disable/enable did not reset their uptime either. They held their old main-LAN DHCP leases and never re-DHCP'd onto VLAN 30 until **Howard physically powered each off/on** (2026-06-18), after which all 8 pulled VOICE leases `.224-.231`. For any externally-powered wired device, plan an on-site/hands-on power-cycle for a VLAN move.
|
||||
- **UniFi controller PUT 403 / CSRF:** rapid controller writes can 403 -- read the CSRF token from the `x-updated-csrf-token` response header (TOKEN-cookie JWT as fallback). pfSense SSH and the controller API both rate-limit under many rapid queries; alternate between them.
|
||||
- **API scratch files must be written OUTSIDE the repo.** Controller-scratch (`.sta.json`, `.fleet*.dev`, etc.) written CWD-relative got swept into commits by `git add -A` and blocked rebases (a stray locked `curl.exe` held them). Use `mktemp -d` outside the repo; `.gitignore` patterns (`.fleet*`, `.ap[0-9]*`, `.vq[0-9]*`, `.q[0-9]*`) added as a backstop.
|
||||
- **Re-VLANing a wired switch port requires a link bounce to force re-DHCP.** Changing the native VLAN on a UniFi switch port does not reset the NIC link; the device holds its old DHCP lease. Fix: bounce the port (PoE power-cycle for PoE devices; disable/enable via controller API for non-PoE). A UniFi client block/unblock is a MAC-address filter only -- it does NOT bounce the link. Controller API port-bounce requires the `X-CSRF-Token` from the login response header (`x-updated-csrf-token`).
|
||||
- **Externally-powered devices (AudioCodes desk phones) need a PHYSICAL power-cycle, not a controller bounce.** The 8 AudioCodes sit on USW-16-PoE ports 1-8 but run on **external power bricks (PoE OFF on those ports)** -- so a UniFi PoE power-cycle AND a controller port disable/enable are both no-ops. They held their old main-LAN DHCP leases until Howard physically powered each off/on (2026-06-18) -> VOICE leases `.224-.231`.
|
||||
- **UniFi controller PUT 403 / CSRF:** rapid controller writes can 403 -- read the CSRF token from the `x-updated-csrf-token` response header (TOKEN-cookie JWT as fallback).
|
||||
- **API scratch files must be written OUTSIDE the repo.** Controller-scratch written CWD-relative got swept into commits. Use `mktemp -d` outside the repo; `.gitignore` patterns (`.fleet*`, `.ap[0-9]*`, `.vq[0-9]*`, `.q[0-9]*`) added as a backstop.
|
||||
- **Verify VLAN membership via the client `vlan` field, not the controller's displayed IP.** IP field caches/lags (Kitchen server phone showed stale 192.168.1.126 while actually on vlan:30).
|
||||
|
||||
### Voice QoS (VLAN 30) -- design (2026-06-18, NOT yet built)
|
||||
|
||||
Full design: `docs/network/phase1-voice-qos-design.md`. Status DESIGN -- nothing applied (Cascades per-change-go rule).
|
||||
Full design: `docs/network/phase1-voice-qos-design.md`. Status DESIGN -- nothing applied.
|
||||
|
||||
- **The VLAN move's QoS payoff: all voice is one subnet `10.0.30.0/24`,** so QoS matches *all* voice by **source subnet** -- no per-PBX SIP/RTP port guessing. This is the cleanest match criterion and only became possible by isolating voice onto VLAN 30. Phones confirmed marking **DSCP EF (46)** (2026-06-18), so DSCP drives WMM (L2) + switch QoS (L3); subnet match is the safety net (no pfSense set-DSCP rule needed).
|
||||
- **QoS is INSURANCE, not the everyday fix.** Phones register to a CLOUD PBX (Vertical) over the internet, so the theoretical bottleneck is WAN-upload saturation. But measured WAN1 fiber upload **~522 Mbps** vs ~98 Mbps peak usage = huge headroom -> the WAN is not the day-to-day constraint. QoS earns its place for (1) **WAN2 (coax) failover** (small upload + a big upload = real congestion) and (2) rare WAN1 saturation (backup/large upload). The everyday dropped-calls cause is **RF** -- build QoS (cheap, correct) but set expectations.
|
||||
- **Layered design:** **L1 pfSense HFSC shaper** on BOTH WANs -- 3 queues `qVoice` (prio 7, realtime ~30%, source `10.0.30.0/24` via floating out rule), `qACK` (~10%), `qDefault` (default ~60%); shape to ~90-95% of actual upload to keep the queue in pfSense. **L2 UniFi WMM** maps DSCP EF -> WiFi Voice AC (protects the Poly phones over the air -- verify WMM on CSCNet). **L3 UniFi switch QoS** queues tagged voice (mostly automatic; confirm USW isn't stripping DSCP). **L4 DSCP marking** (confirmed EF on the phones). Blocker for L1 sizing: the WAN2 coax upload number (remote test failed).
|
||||
- **Build path:** Firewall -> Traffic Shaper -> Wizard "Multiple Lan/Wan" (prioritize by address `10.0.30.0/24`), or hand-build HFSC + floating rule. Howard drives the pfSense GUI. Rollback = disable/remove the shaper (QoS only orders packets under congestion; removing reverts to FIFO, zero residual). Skill gap: `unifi-wifi` has no QoS verb (pfSense + UniFi config task).
|
||||
|
||||
### Network Optimization Master Plan (all-device, 2026-06-18, NOT yet executed)
|
||||
|
||||
Full plan: `docs/network/network-optimization-master-plan.md`. Goal: fix the *system* for all ~587 clients, not one device at a time. Floors 1-4 only this round; **Floors 5/6 (MemCare) RF + phones DEFERRED** per Howard.
|
||||
|
||||
- **Core principle: open relief valves BEFORE constraining,** or congestion just relocates (the "whack-a-mole" trap). Sequence: **P0** baseline capture (same time-of-day) -> **P1** voice QoS (orthogonal, do first) -> **P2a** enable 6 GHz on CSCNet + BSS-transition (the offload path) + **P2b** correct the 2.4 over-thinning **Low->MEDIUM** (~12-15 dBm, not lower -- Low already starved edge clients) -> **P3** 5 GHz 80->40 MHz + **non-DFS** channel plan + relieve AP 103 -> **P4** fine-tune (2.4 1/6/11, min-RSSI, 802.11k/v roaming) -> **P5** physical cabling (separate visit).
|
||||
- **Interdependencies:** 6 GHz before 5 GHz-40MHz; 2.4 power Medium not Low; AP 103 must be relieved (Lauren locked there); never stack disables + power-down in one area (that caused the over-thinning); tune one lever per zone; never disable mesh-protected APs (2nd Floor Atrium, CC Bridge, salon, 206 U7 Pro, 108).
|
||||
- **Data-driven gate rule (Howard):** every change is a hypothesis gated on fleet-wide metrics (avg retry%, cu_total, cu_interf, satisfaction, band split, per-AP coverage holes). KEEP+proceed only if the target improved AND fleet-wide satisfaction didn't fall / retry didn't rise / no AP lost its clients; HOLD if a secondary metric regressed; ROLLBACK on any fleet regression or complaint. Validate ALL devices (CSCNet 427 + CSC ENT 131 + Guest 13), not just the 31 voice phones. Every `apply-radio`/`apply-wlan` writes a rollback JSON.
|
||||
|
||||
### Decisions resolved 2026-06-18 (voice/RF)
|
||||
|
||||
- **5 GHz: USE THE CLEAN DFS CHANNELS** (REVERSED 2026-06-19 by measured data; the prior "non-DFS only" call was wrong for THIS site). The full channel survey (74/74 APs) showed DFS = 2-3% busy vs non-DFS 10-28% (ch149/157 are the worst on property). The everyday, every-call congestion on non-DFS is real and measured; the radar risk is hypothetical (0 genuine hits observed). So Howard chose the clean DFS channels (52/60/100/108/116/124/132/140) for voice quality. Safety net: UniFi auto-vacates a DFS channel on radar (regulatory -- moves ONE AP, not the fleet); FOLLOW-UP = stand up a recurring `dfs-check.sh` radar monitor. (This supersedes the 2026-06-18 "non-DFS only" decision -- which was made before the per-channel scan existed.)
|
||||
- **NO dedicated voice SSID** -- voice stays on the shared CSCNet PPSK. UniFi 3-SSID cap is sound RF hygiene (each SSID = beacon airtime at 77 APs); the only retirement candidate CSC ENT still has 131 active clients (staff PCs, printers, DirecTV) so a slot isn't free; and a voice SSID isn't needed (QoS is VLAN/DSCP-based and SSID-independent, band preference is best set phone-side via Vertical, roaming/power-save are phone+AP settings). Revisit only if CSC ENT's clients migrate off.
|
||||
- **The VLAN move's QoS payoff: all voice is one subnet `10.0.30.0/24`,** so QoS matches all voice by **source subnet** -- no per-PBX SIP/RTP port guessing. Phones confirmed marking **DSCP EF (46)**.
|
||||
- **QoS is INSURANCE, not the everyday fix.** Measured WAN1 fiber upload ~522 Mbps vs ~98 Mbps peak usage -> WAN is not the day-to-day constraint. QoS earns its place for (1) **WAN2 (coax) failover** and (2) rare WAN1 saturation. The everyday dropped-calls cause is **RF** (band selection).
|
||||
- **Layered design:** **L1 pfSense HFSC shaper** on BOTH WANs -- 3 queues `qVoice` (prio 7, realtime ~30%, source `10.0.30.0/24` via floating out rule), `qACK` (~10%), `qDefault` (~60%). **L2 UniFi WMM** maps DSCP EF -> WiFi Voice AC. **L3 UniFi switch QoS.** Blocker for L1 sizing: WAN2 coax upload number.
|
||||
- **Build path:** pfSense GUI -> Traffic Shaper -> Wizard "Multiple Lan/Wan". Howard drives. Rollback = disable/remove the shaper (zero residual; reverts to FIFO).
|
||||
|
||||
### pfSense Operations
|
||||
|
||||
- **pfSense 25.07 logs are PLAIN TEXT, not binary clog.** Read with `tail`/`grep` directly (e.g., `tail -5000 /var/log/dhcpd.log`). Using `clog` returns empty output and will cause false conclusions. All log files confirmed ASCII text (`file /var/log/*.log`).
|
||||
- **pfSense OpenVPN `--inactive` idle timeout:** The Cascades OpenVPN server (`ovpns1`) has a configured `--inactive` timeout (~300s). This disconnects idle clients after ~5 min of no tunnel data. Keepalive pings do NOT reset this counter (`--inactive` measures actual tunnel data, not keepalive packets). Symptom: OpenVPN Connect auto-reconnects repeatedly; the pfSense log shows `Inactivity timeout (--inactive), exiting`. This is a config setting, not a fault. Duplicate-CN events (which would indicate a different issue) are absent. Fix: raise or disable the `--inactive` parameter on the OpenVPN server profile. Fix proposed 2026-06-18; not yet applied (requires go-ahead).
|
||||
- **pfSense dirty-boot / duplicate dhcpd:** After an unclean pfSense shutdown (e.g., power loss on surge-only UPS), ZFS survives but dhcpd may start twice. Symptom: DHCP DISCOVER->OFFER loop with no REQUEST/ACK completion (clients' OFFERs are handled by the wrong daemon instance). Fix: `killall dhcpd && echo "services_dhcpd_configure();" | /usr/local/sbin/pfSsh.php`; verify one instance: `pgrep -f "dhcpd -user" | wc -l` == 1. Note: `pfSsh.php` is slow (~20-40s); use timeout 60s+.
|
||||
- **Post-outage device stragglers:** Devices that booted during a DHCP-down window cache a disconnected state and do not retry once the network recovers. A DHCP-log scan cannot find them (they stop sending DISCOVER). Realistic plan: reactive power-cycle as reports come in. Cox modem must be rebooted after a pfSense configuration restore (otherwise WAN may not fully re-establish).
|
||||
- **pfSense 25.07 logs are PLAIN TEXT, not binary clog.** Read with `tail`/`grep` directly. Using `clog` returns empty output and will cause false conclusions.
|
||||
- **pfSense OpenVPN `--inactive` idle timeout:** The Cascades OpenVPN server has a configured `--inactive` timeout (~300s). This disconnects idle clients after ~5 min of no tunnel data. Keepalive pings do NOT reset this counter. Fix: raise or disable the `--inactive` parameter on the server profile. Fix proposed 2026-06-18; not yet applied.
|
||||
- **pfSense dirty-boot / duplicate dhcpd:** After an unclean pfSense shutdown, dhcpd may start twice. Fix: `killall dhcpd && echo "services_dhcpd_configure();" | /usr/local/sbin/pfSsh.php`; verify one instance: `pgrep -f "dhcpd -user" | wc -l` == 1. Note: `pfSsh.php` is slow (~20-40s); use timeout 60s+.
|
||||
- **Post-outage device stragglers:** Devices that booted during a DHCP-down window cache a disconnected state and do not retry once the network recovers. Realistic plan: reactive power-cycle as reports come in. Cox modem must be rebooted after a pfSense configuration restore.
|
||||
|
||||
### Known Issues / Pending Hygiene (as of 2026-06-18)
|
||||
### Known Issues / Pending Hygiene (as of 2026-06-20)
|
||||
|
||||
- **[BUG] Stale exclude-group on MFA-all-users policy:** The `Require multifactor authentication for all users` policy (`7e87a1c7...`) excludes `SG-Caregivers-Pilot` (`0674f0bc...`) instead of the live `SG-Caregivers` (`8b8d9222...`). Fix: PATCH `excludeGroups` to replace `SG-Caregivers-Pilot` with `SG-Caregivers`.
|
||||
- **[DESIGN] ALIS-native 2FA is not a perimeter control.** The correct permanent model: force all ALIS logins through Entra SSO (SSO-only, credential fallback disabled). Office/privileged users should be standardized onto ALIS SSO as a separate workstream; ALIS-native 2FA should then be disabled per-user then globally.
|
||||
- **[INFO] Android enrollment token expiry (2027-05-08) does NOT unenroll devices.** Renewal is needed only before enrolling new devices after that date.
|
||||
- **[WARN] ~25 switch ports at 100 Mbps but gig-capable.** Physical: re-terminate/replace cable or check NIC. Investigate after WiFi Phase A remediation is stable.
|
||||
- **[WARN] 3 offline switches** (Switch 2nd Floor #2 -- reset+re-adopted 2026-06-17 after power outage but may still show in some monitors, Switch 4th Floor #2, USW Pro Max 16). Root cause unknown for #2 and #3; investigate onsite.
|
||||
- **[SECURITY] Synology Cloud Signin Portal credential exposed in vault git history (commit 1fbc0e1).** `clients/cascades-tucson/synology-signin-portal.sops.yaml` was committed plaintext; encrypted go-forward but credential must be rotated. Verify MDM service account + WiFi CSCNet entries from the same commit were never plaintext.
|
||||
- **[FLEET] Leftover Datto stack (CentraStage + Infocyte/DattoAV) -- not yet cleaned up.** Confirmed on CS-SERVER (thrashing degraded disk) and DESKTOP-TRCIEJA (Lupe Sanchez, causing dual-AV slow Excel). DESKTOP-TRCIEJA will be replaced (no cleanup needed on that box). CS-SERVER cleanup still open.
|
||||
- **[WARN] DESKTOP-TRCIEJA duplicate computer name on network (Event 2505).** Recurring event on Lupe Sanchez's machine. Moot if machine is replaced; note for the replacement provisioning.
|
||||
- **[BUG] Stale exclude-group on MFA-all-users policy:** The `Require multifactor authentication for all users` policy (`7e87a1c7...`) excludes `SG-Caregivers-Pilot` (`0674f0bc...`) instead of the live `SG-Caregivers` (`8b8d9222...`). Fix: PATCH `excludeGroups`.
|
||||
- **[DESIGN] ALIS-native 2FA is not a perimeter control.** Force all ALIS logins through Entra SSO (SSO-only, credential fallback disabled); disable ALIS-native 2FA per-user then globally.
|
||||
- **[INFO] Android enrollment token expiry (2027-05-08) does NOT unenroll devices.** Renewal needed only before enrolling new devices after that date.
|
||||
- **[WARN] ~25 switch ports at 100 Mbps but gig-capable.** Investigate after WiFi optimization is stable.
|
||||
- **[WARN] 3 offline switches** (Switch 4th Floor #2, USW Pro Max 16 -- root cause unknown; Switch 2nd Floor #2 was reset+re-adopted 2026-06-17). Investigate onsite.
|
||||
- **[SECURITY] Synology Cloud Signin Portal credential exposed in vault git history (commit 1fbc0e1).** Encrypted go-forward but credential must be rotated.
|
||||
- **[FLEET] Leftover Datto stack (CentraStage + Infocyte/DattoAV) -- not yet cleaned up on CS-SERVER.** DESKTOP-TRCIEJA will be replaced (no cleanup needed on that box). CS-SERVER cleanup still open.
|
||||
|
||||
### Security Incidents (historical)
|
||||
|
||||
- **Megan Hiatt (2026-04-16):** Active credential-stuffing -- 126 failed sign-ins, bursts from Belfast GB, Hamburg DE. Password reset and SMTP AUTH disable were action items. Mailbox was clean (not breached).
|
||||
- **John Trozzi (2026-04-16, 2026-04-20):** Investigated twice -- both times NO BREACH. First: credential stuffing flag (clean). Second: inbound phishing email (clean). Reports in `clients/cascades-tucson/reports/`.
|
||||
- **John Trozzi (2026-04-16, 2026-04-20):** Investigated twice -- both times NO BREACH. First: credential stuffing flag (clean). Second: inbound phishing email (clean).
|
||||
- **Crystal Rodriguez (2026-04-19):** Phishing investigation. Report: `clients/cascades-tucson/reports/2026-04-19-crystal-rodriguez-phish-investigation.md`.
|
||||
- **Canva email delivery (2026-05-20):** Alma Montt not receiving Canva invites. Resolved by adding canva.com domains to AllowedSenderDomains in EOP policies.
|
||||
- **ALIS AADSTS65001 (2026-06-03):** megan.hiatt, karen.rossini, memcarereceptionist could not sign in to ALIS on non-phone devices. Root cause: missing tenant-wide admin consent on ALIS SP (`e1cae4ad`). Resolved by granting `AllPrincipals` `User.Read` via Graph API.
|
||||
@@ -467,40 +427,38 @@ Full plan: `docs/network/network-optimization-master-plan.md`. Goal: fix the *sy
|
||||
### HIPAA Compliance
|
||||
|
||||
- **Primary objective.** Cascades stores PHI on CS-SERVER and uses ALIS for clinical records.
|
||||
- **Critical open gaps:** No audit logging on D:\Homes (SS164.312(b)); Object Access auditing disabled; no SMB encryption on homes share; no file access auditing. Audit retention infra (LAW 90d + Storage 6yr) approved but not yet built.
|
||||
- **Backup gap closed (2026-06-15):** Mike installed ACG cloud backup (MSP360/CloudBerry -> ACG-backup server) on CS-SERVER. Verify first full backup completes and set retention; confirm image-based / bare-metal + system-state for DC recoverability.
|
||||
- **Critical open gaps:** No audit logging on D:\Homes (SS164.312(b)); Object Access auditing disabled; no SMB encryption on homes share. Audit retention infra (LAW 90d + Storage 6yr) approved but not yet built.
|
||||
- **Backup gap closed (2026-06-15):** Mike installed ACG cloud backup (MSP360/CloudBerry) on CS-SERVER. Verify first full completes + confirm image-based / bare-metal + system-state + retention before any drive work.
|
||||
- **Restored 7 deleted mailboxes (2026-04-25)** for HIPAA SS164.316(b)(2) 7-year retention.
|
||||
- **Termination policy established:** Convert to shared mailbox, hide from GAL, retain 7 years.
|
||||
- **Voice VLAN 30 (HIPAA-isolated):** All voice gear (phones + Vertical desktop) on an isolated network with internet/cloud-PBX egress only; blocked from PHI/LAN/VLAN20/mgmt. **Migration COMPLETE 2026-06-19: 37 devices on VOICE (28 Poly + 8 AudioCodes + desktop).**
|
||||
- **Voice VLAN 30 (HIPAA-isolated):** All voice gear on an isolated network with internet/cloud-PBX egress only; blocked from PHI/LAN/VLAN20/mgmt. **Migration COMPLETE 2026-06-19: 37 devices on VOICE (28 Poly + 8 AudioCodes + desktop).**
|
||||
|
||||
---
|
||||
|
||||
## Active Work
|
||||
|
||||
Syncro live pull 2026-06-18: **0 open tickets.** No hours drawn from the 2026-06-17/18 sessions (all advisory/diagnostic/no billable infra changes).
|
||||
Syncro live pull 2026-06-20: **0 open tickets.**
|
||||
|
||||
**Non-Syncro follow-ups open as of 2026-06-18:**
|
||||
**Non-Syncro follow-ups open as of 2026-06-20:**
|
||||
|
||||
- **[URGENT] Order replacement workstation for Lupe Sanchez (DESKTOP-TRCIEJA).** Decision made 2026-06-18. EOL Gateway ZX6971 / i3-2120 / 8 GB / Win11-unsupported. On new machine: provision GuruRMM + Bitdefender only; do NOT carry over the Datto stack.
|
||||
- **[URGENT] Rotate exposed Synology Cloud Signin Portal credential.** Vault commit 1fbc0e1 committed it plaintext; encrypted go-forward but credential is exposed in git history. Also verify MDM service account + WiFi CSCNet from that same commit were never plaintext.
|
||||
- **[DONE 2026-06-19] Voice VLAN (VLAN 30) migration COMPLETE -- 37 devices on VOICE** (28 Poly, 8 AudioCodes `.224-.231`, Vertical desktop `.201`). All Poly re-keyed by Howard. RF optimized too (2.4 power->medium, 5 GHz on clean DFS, 5G retry halved). Billed: ticket #32444 (7h prepaid -- 4 onsite + 3 remote).
|
||||
- **[PENDING - hardware] Bistro phone replacement.** The Kitchen server phone was bad (John pulled it 2026-06-19); the Bistro phone was relocated to the Kitchen to cover it, so the **Bistro has no phone**. Set up + re-key the replacement to the voice PPSK when it arrives.
|
||||
- **[WAITING ON VERTICAL - the last voice item] Set Poly handsets to 5 GHz-only.** The residual dropped-calls are a band-selection problem: phones sit on saturated 2.4 GHz despite strong 5 GHz-capable signal, and controller band-steering (already on) won't hold the Poly fleet on 5 GHz. Phone-side 5 GHz lock is the fix -- request sent to Richard Turner 2026-06-19 (`docs/network/2026-06-19-vertical-5ghz-lock-request.md`), **awaiting their response**. After they push it: re-pull per-phone data + confirm all on 5 GHz. (Lauren `.202`, the worst original case, already went 2.4/50% -> 5GHz/12% from the RF work.)
|
||||
- **[INVESTIGATE] Phone `.210`** -- on 5 GHz at -65 dBm (good signal) but ~64% retry on a clean channel; anomalous (AP-217 or per-phone), separate from the band-selection issue.
|
||||
- **[PENDING - build] Voice QoS for VLAN 30** (pfSense HFSC 3-queue on both WANs matching `10.0.30.0/24` + UniFi WMM/switch QoS). Design done, not built (Howard drives pfSense GUI). Blocker for sizing: the WAN2 coax upload number. QoS is insurance (WAN has headroom); RF is the everyday fix. Design: `docs/network/phase1-voice-qos-design.md`.
|
||||
- **[PENDING - execute] Network optimization master plan (floors 1-4; MemCare deferred).** Sequenced P1 QoS -> P2a enable 6 GHz on CSCNet + P2b 2.4 Low->Medium -> P3 5 GHz 40 MHz + non-DFS + relieve AP 103 -> P4 fine-tune -> P5 physical. Open relief valves before constraining; per-zone, dry-run, gated on fleet metrics. Start = P2b (baseline capture + 2.4 Low->Medium). Pending Howard's go + evening window. Plan: `docs/network/network-optimization-master-plan.md`. (Supersedes the older "Wireless RF Phase 0 + Phase 1" item below -- same work, holistic framing.)
|
||||
- **[DONE 2026-06-19] Voice VLAN (VLAN 30) migration COMPLETE -- 37 devices on VOICE** (28 Poly, 8 AudioCodes `.224-.231`, Vertical desktop `.201`). All Poly re-keyed by Howard. RF optimized (2.4 power->medium, 5 GHz clean DFS, retry halved). Billed: ticket #32444 (7h prepaid -- 4 onsite + 3 remote).
|
||||
- **[PENDING - hardware] Bistro phone replacement.** Kitchen server phone was bad (John pulled it 2026-06-19); the Bistro phone was relocated to the Kitchen to cover it, so the **Bistro has no phone**. Set up + re-key the replacement to the voice PPSK when it arrives.
|
||||
- **[WAITING ON VERTICAL - the last voice item] Set Poly handsets to 5 GHz-only.** Residual dropped-calls are a band-selection problem: phones sit on saturated 2.4 GHz despite strong 5 GHz signal, and controller band-steering (already on) won't hold the Poly fleet on 5 GHz. Phone-side 5 GHz lock is the fix -- request sent to Richard Turner 2026-06-19 (`docs/network/2026-06-19-vertical-5ghz-lock-request.md`), **awaiting their response**. After they push it: re-pull per-phone data + confirm all on 5 GHz.
|
||||
- **[INVESTIGATE] Phone `.210`** -- on 5 GHz at -65 dBm (good signal) but ~64% retry on a clean channel; anomalous (AP-217 or per-phone issue).
|
||||
- **[PENDING - build] Voice QoS for VLAN 30** (pfSense HFSC 3-queue on both WANs matching `10.0.30.0/24` + UniFi WMM/switch QoS). Design done, not built (Howard drives pfSense GUI). Blocker for sizing: the WAN2 coax upload number. Design: `docs/network/phase1-voice-qos-design.md`.
|
||||
- **[PENDING - deferred] Enable 6 GHz on CSCNet.** Blocked on `Wpa3MandatoryFor6GHzBand` -- converting CSCNet from WPA2/PPSK to WPA3+PMF touches all 427 clients. Largest untapped RF relief valve. Howard's supervised decision + coordinated change window.
|
||||
- **[PENDING] Measure WAN2 (coax) upload** -- remote source-route test failed; get from a WAN2-routed host or the Cox bill (sizes the failover voice shaper).
|
||||
- **[PENDING] Hand Vertical (Richard Turner) the phone-side config list** -- 5 GHz band lock, DSCP-on, 802.11k/v roaming, U-APSD/power-save, firmware.
|
||||
- **[PLANNED] Network logging / observability (spec written, build later).** Diagnosis 2026-06-18: the UniFi controller retains **ZERO** client events/alarms for Cascades (7-day pull) and pfSense logs roll over in hours -> device drops/kicks/deauths are not captured, so the network is a black box after the fact. Plan: **Synology cascadesDS (DSM Log Center syslog server) as the on-site collector** (NOT CS-SERVER -- fragile EOL DC), with pfSense + UniFi-controller + AP syslog as sources and a 1-2 min `/stat/sta` client snapshotter to fill the controller's history gap. Optional later: Container Manager Graylog/Loki + Discord alerting. Spec: `docs/network/network-logging-plan.md`. Next: confirm Synology model/RAM/DSM.
|
||||
- **[PENDING] Wireless RF Phase 0 + Phase 1 (pending go-ahead + evening window):**
|
||||
- Phase 0 (safe anytime): pfSense ping-check off for 240 DHCP pools, disable 3 AM AP firmware auto-upgrade, enable full pfSense logging (DHCP/DNS/firewall/system/gateway) with rotation.
|
||||
- Phase 1 (windowed, per-zone, evening): combined per-AP radio_table PUT -- ng power medium (42 at-Low radios only, not the 24 disabled), na ht 40 (76 radios), na min_rssi -82 (69 at -77). Dry-run clean. Rollback auto-saved. Validate with watch-ap before/after.
|
||||
- 5 GHz Option B (same window or separate): 40MHz + non-DFS channel plan + min-RSSI -82; width + channel are coupled (width alone fixed only 7/25 co-channel pairs). Auto-channel assignment made things worse -- do NOT use UniFi auto-channel; use `channel-plan na`.
|
||||
- Standing rule: no Cascades prod-infra changes without discussing + explicit per-change go (memory `feedback_cascades.md` rule #4).
|
||||
- **[PENDING] pfSense OpenVPN `--inactive` timeout fix.** Raise/disable the `--inactive` idle timeout on the Cascades OpenVPN server profile (~300s -> raise or remove). Proposed, not applied. Needs go-ahead.
|
||||
- **[PENDING] Enable Netgate AutoConfigBackup** on pfSense (no off-box config backup existed before 2026-06-17 manual vault; on-box auto-backup exists but only one version). Also verify UPS covers core infra + PoE switches on battery-backed outlets (pfSense rectified; other gear not confirmed).
|
||||
- **[PENDING] Synology Drive Team Folder migration (department shares -> CS-SERVER).** Diagnosis complete (2026-06-18): current Drive sync covers only the Sync-user's My Drive, not the real shared folders. Plan: Admin Console Team Folder enable + low versioning; Drive Client Download-only tasks into `D:\Shares\_SynMigration\<share>`; pilot on `/volume1/Server` (1.9 G, 2,486 files) first. Pending: confirm in-scope share list, confirm ALDocs coverage in real shares, get go-ahead to execute. Runbook (optional): `clients/cascades-tucson/docs/migration/synology-team-folder-migration.md`.
|
||||
- **[PENDING] Watch for post-outage device stragglers.** Devices that booted during the 2026-06-17 DHCP-down window (duplicate dhcpd) may have cached a disconnected state. Kitchen thermal printer resolved by power-cycle. Expect additional IoT/printer/POS reports; fix each by power-cycle.
|
||||
- **[PENDING] Re-enable 3 AM AP auto-upgrade** (left OFF after 2026-06-19 overnight run; re-enable when ready).
|
||||
- **[PENDING] Stand up recurring `dfs-check.sh` radar monitor** on the DFS channels (fold into network-logging plan) -- UniFi auto-vacates one AP on radar hit; the monitor tells us if it ever fires.
|
||||
- **[PENDING - next week] MemCare min-RSSI (floors 5/6)** -- deferred until Howard adds new APs to floors 5/6; rooms 515/210/204 have weak clients that would be orphaned by min-RSSI today.
|
||||
- **[PLANNED] Network logging / observability (spec written, build later).** Plan: **Synology cascadesDS (DSM Log Center syslog server)** as on-site collector, pfSense + UniFi-controller + AP syslog as sources, `/stat/sta` client snapshotter to fill the controller's history gap. Spec: `docs/network/network-logging-plan.md`. Confirm Synology model/RAM/DSM before build.
|
||||
- **[PENDING] Synology Drive Team Folder migration (department shares -> CS-SERVER).** Current Drive sync covers only the Sync-user's My Drive, not the real shared folders. Pilot on `/volume1/Server` (1.9 G) first. Pending: confirm in-scope share list, get go-ahead to execute.
|
||||
- **[PENDING] Watch for post-outage device stragglers.** Devices that booted during the 2026-06-17 DHCP-down window may have cached a disconnected state. Kitchen thermal printer resolved by power-cycle. Expect additional IoT/printer/POS reports; fix each by power-cycle.
|
||||
- **[PENDING] pfSense OpenVPN `--inactive` timeout fix.** Raise/disable the `--inactive` idle timeout (~300s) on the Cascades OpenVPN server profile. Proposed, not applied.
|
||||
- **[PENDING] Enable Netgate AutoConfigBackup** on pfSense (no off-box config backup existed before 2026-06-17 manual vault). Also verify UPS covers all core infra + PoE switches on battery-backed outlets (pfSense rectified; others not confirmed).
|
||||
- **[PLANNED] KPI dashboard (Ashley Jensen):** scoped 2026-06-17; client one-pager drafted. Parked pending Ashley's day-one KPIs, data-freshness need, and POS/Focus-HR specifics. Next: deliver one-pager; confirm ALIS analytics availability with Medtelligent.
|
||||
|
||||
**Migration phase status (as of 2026-05-26):**
|
||||
|
||||
@@ -524,17 +482,15 @@ Syncro live pull 2026-06-18: **0 open tickets.** No hours drawn from the 2026-06
|
||||
- RECEPTIONIST-PC GuruRMM agent (9c91d324): flaky WebSocket, lagging fleet
|
||||
- Entra Connect: OU=Administrative not yet in sync scope; UPN suffix updates for that OU pending
|
||||
- NURSESTATION-PC: reboot required to activate `CSC - Caregiver Device Lockdown` GPO (deployed 2026-06-05; verify lock@3min, 90s warning, sign-out@15min, never-sleep)
|
||||
- #32370 -- eFax/scanner onsite (Howard); verify/likely closed (Syncro live 2026-06-18 shows 0 open)
|
||||
- Caregiver device allow-list: ASSISTNURSE-PC needs re-join + re-tag after Win11 reinstall; LAPTOP-8P7HDSEI Win11 upgrade + join/tag still pending; then cutover (enable allow-list policy, disable compliance-block)
|
||||
- ALIS office/privileged standardization: move office/managers/nurses to ALIS SSO-only; disable ALIS-native 2FA per-user then globally
|
||||
- Fix stale `SG-Caregivers-Pilot` exclude-group on `Require MFA for all users` policy
|
||||
- LAPTOP-8P7HDSEI: upgrade Win 10 -> Win 11 before PHI use
|
||||
- Edge UNC download bug (Chromium 149): decide fix path for Ashley Jensen + Lois Lane and fleet; no fix applied as of 2026-06-08
|
||||
- ALIS app session timeout: lower from 20 to 15 min (Howard, ALIS admin) -- PENDING
|
||||
- **[CRITICAL] CS-SERVER degraded RAID-1 (2026-06-15):** OS mirror (C:) running on a single 320 GB Hitachi 5400 RPM laptop spindle, no redundancy. Recommended replacement: 2x 480 GB enterprise 2.5" SATA SSD (e.g. Solidigm D3-S4520 or Samsung PM893; no Dell drive lockout; min size 480 GB class; SAS 6/iR controller, 3 Gbps, no TRIM). Hot-swap capable. Plan rebuild-then-swap (image C: first, AFTER backup verifies; re-pull OMSA live before any physical action). DC migration is the real fix.
|
||||
- **[CRITICAL] CS-SERVER degraded RAID-1 (2026-06-15):** OS mirror (C:) running on single 320 GB Hitachi 5400 RPM laptop spindle. Recommended replacement: 2x 480 GB enterprise 2.5" SATA SSD (e.g. Solidigm D3-S4520 or Samsung PM893). Gated on backup verification.
|
||||
- **[INFO] CS-SERVER cloud backup (MSP360/CloudBerry, installed 2026-06-15):** verify first full completes + confirm image-based / bare-metal + system-state + retention before any drive work.
|
||||
- **[CLEANUP] CS-SERVER agent sprawl:** remove the previous MSP's leftover Datto RMM (CentraStage) + Datto EDR (Infocyte) stack (thrashing the degraded disk).
|
||||
- **[PROPOSED] Unified KPI dashboard (Ashley Jensen):** scoped 2026-06-17; client one-pager drafted. Parked pending Ashley's day-one KPIs, data-freshness need, and POS/Focus-HR specifics. See Business Applications & Reporting Systems section. Next: deliver one-pager; confirm ALIS analytics/data-feed availability with Medtelligent.
|
||||
- **[CLEANUP] CS-SERVER agent sprawl:** remove the previous MSP's leftover Datto RMM (CentraStage) + Datto EDR (Infocyte) stack.
|
||||
|
||||
---
|
||||
|
||||
@@ -554,92 +510,48 @@ Syncro live pull 2026-06-18: **0 open tickets.** No hours drawn from the 2026-06
|
||||
| 2026-04-30 | CA rollout (Report-only mode): 3 caregiver policies created. SDM bootstrap. |
|
||||
| 2026-05-01 | Howard billed 33.5 hrs against prepaid block on Entra project ticket #32214 ($0 invoice). |
|
||||
| 2026-05-07-08 | SDM phone provisioning. SDM token success. ALIS SSO app registration values captured to vault. |
|
||||
| 2026-05-14-16 | Caregiver AD accounts created. Security groups always deliberate (no OU->group automation). Wireless diagnostic (read-only via cloud API; 2.4 GHz saturation hypothesis identified; local controller inaccessible at the time). |
|
||||
| 2026-05-14-16 | Caregiver AD accounts created. Entra Connect exited staging -- actively syncing. Wireless diagnostic (read-only via cloud API; 2.4 GHz saturation hypothesis identified). |
|
||||
| 2026-05-18 | Billing review. 39.5 hrs remaining before session. 7 hrs billed separately. |
|
||||
| 2026-05-20 | Canva email delivery resolved (canva.com domains added to EOP). |
|
||||
| 2026-05-21 | Crystal Rodriguez folder redirect confirmed working. Lauren Hasselman + Crystal Rodriguez domain join attempted -- passwords didn't work initially. |
|
||||
| 2026-05-22 | Ashley Jensen domain-joined. RECEPTIONIST-PC domain-joined. GPO ILT fixes (FrontDesk printer + R: drive). cascadesDS auth failure diagnosed (workgroup collision) and deferred. |
|
||||
| 2026-05-14 | Entra Connect exited staging mode -- actively syncing. CA pilot re-pointed to SG-Caregivers. |
|
||||
| 2026-05-21 | Crystal Rodriguez folder redirect confirmed working. |
|
||||
| 2026-05-22 | Ashley Jensen domain-joined. RECEPTIONIST-PC domain-joined. GPO ILT fixes. cascadesDS auth failure diagnosed (workgroup collision) and deferred. |
|
||||
| 2026-05-23 | Lauren Hasselman folder redirect complete. Megan Hiatt (Marketing) confirmed in AD, domain join pending. |
|
||||
| 2026-05-26 | Access control vendor meeting onsite (ticket #32324). 0.5h Howard + 0.5h Mike billed against prepaid block. Block at 28.0h. |
|
||||
| 2026-06-03 | ALIS AADSTS65001 diagnosed and resolved: granted tenant-wide admin consent on ALIS SP `e1cae4ad`. Caregiver device allow-list CA policy created in report-only (`CSC - Caregivers: allow-listed devices only (REPORT-ONLY)`, id `1b7fd025`). |
|
||||
| 2026-06-04 | Three same-day tickets: #32381 Tamra scanner (0.5h onsite), #32382 Megan file access (1.5h onsite), #32383 Chris Knight bill.com/BOK email delivery (1.5h remote). Root cause sender-side. EXO access token auth method documented. |
|
||||
| 2026-06-05 | NURSESTATION-PC localadmin login-screen issue (`SpecialAccounts\UserList` hide) -- removed via RMM. Vault hygiene: `sysadmin@` GA password vaulted; voice MFA scoped group created; `alternateMobile` updated to +1 520-585-1310 (Howard). Caregiver test rig built. Hybrid Entra Join enabled; NURSESTATION re-domain-joined + hybrid-registered (new deviceId `d3bf931f`). Caregiver access model proven end-to-end: pilot.test + NURSESTATION, ALIS via silent SSO. GPOs deployed: `CSC - Caregiver Workstation` validated; `CSC - Caregiver Device Lockdown` deployed to `OU=Caregiver Devices`. Ticket #32303 billed 7.0h, invoice #67782 ($0.00 prepaid). |
|
||||
| 2026-06-08 | **Chris Knight workstation setup (onsite).** AD account finished (OU=Administrative, home folder, SG-FolderRedirect, mail set). Machine DESKTOP-N5G1ROO domain-joined + GuruRMM-enrolled (`205025ee`), Office installed. **MAJOR: root-caused why folder redirection failed on every machine** -- FR GPO targets were in misnamed `fdeploy1.ini`; Windows reads `fdeploy.ini` (absent) -> empty path -> silent no-op. Fixed by writing correct `fdeploy.ini` to GPO `{512B43A4}` + version bump 917506->983042. Native FR now works for new users. **ASSISTNURSE-PC reinstalled (Win10->Win11).** |
|
||||
| 2026-06-08 | **Edge UNC download bug diagnosed (no fix applied).** Ashley Jensen + Lois Lane on Edge 149.0.4022.52 cannot open Office files from Edge download panel when Downloads is UNC-redirected. Root cause: Chromium 149 regression (issue 519243472) in `LaunchShellExecuteViaExplorer`. Fix path decision left to Howard. |
|
||||
| 2026-06-09 | **Accounting scan-to-folder built + billing reconciliation.** Created `D:\Shares\Accounting` + `\Scans` on CS-SERVER; shared as `\\CS-SERVER\AcctDept`; new vaulted AD service account `svc-scan`; Brother MFC-L8900CDW Scan-to-Network profile configured (NTLMv2; test scan confirmed). Found pfSense blocks main-LAN->VLAN-20. Persistent drive maps set for Chris (Y:), Zachary (Y:), Lauren (X:). Reconciled crashed-session billing; live prepay confirmed 57.75h. |
|
||||
| 2026-06-10 | **Meredith Kuhn locked Word doc -- stale owner files on cascadesDS.** Five orphaned `~$` files dated 2024 in `\\cascadesds\Public\Company Web Docs\Staff Trainings\` caused false lock messages. Diagnosed and deleted via RMM in Meredith's `user_session` on ASSISTMAN-PC. Ticket #32403, 0.5h remote, block 56.75->56.25. |
|
||||
| 2026-06-12 | **Created shared mailboxes grievances@ + Surveys@ and delegated to Meredith & Ashley.** Both SharedMailbox type (cloud-only, no license). FullAccess + SendAs granted. Work via ComputerGuru Exchange Operator cert auth (EXO module v3.10.0 installed on Howard-Home). All 8 permission grants verified. Ticket #32417, 0.5h remote, block 56.25->55.75; Invoiced. |
|
||||
| 2026-06-15 | **Wireless RF full audit -- controller access gained.** Mike vaulted `infrastructure/uos-server-ssh-key` + `clients/cascades-tucson/unifi-ap-ssh` + `infrastructure/uos-server-network-api-rw`. `unifi-wifi` skill used end-to-end. Live audit confirmed 77 U7-Pro APs, ~574->587 clients, 2.4 GHz saturation as primary pain band (avg retry ~10-11%, cu_total 69-94%, catastrophic neighbor density). `live-stats.sh` accuracy bugs found and fixed mid-session (15-AP head cap, wrong satisfaction/retry fields). DFS concern corrected: retry DFS 8.4% ~= non-DFS 9.0% -- no throughput penalty; mid-session misdiagnosis withdrawn. 6 GHz (1 client) identified as largest untapped capacity. Tuning plan staged; no live changes applied. |
|
||||
| 2026-06-15 | **CS-SERVER slowness root-caused to degraded RAID-1; backup started; pfSense OpenVPN password reset.** Dell OMSA: PD 0:0:3 (320 GB WD SATA) Critical/Removed, Virtual Disk2 (C: mirror) Degraded -> C: on a single 320 GB Hitachi 5400 RPM spindle (root cause of slowness). Mike installed MSP360/CloudBerry cloud backup on CS-SERVER (closes HIPAA backup gap). Reset Howard's lost pfSense OpenVPN password via Diagnostics PHP-exec from CS-SERVER (local_user_set_password() -> AUTHOK); vaulted at `clients/cascades-tucson/pfsense-openvpn-howard`. |
|
||||
| 2026-06-16 | **Voice VLAN plan for Vertical phones (PLANNED, not executed).** Diagnosed split voice gear: Poly phones (22, WiFi/CSCNet/VLAN 20), AudioCodes (8, wired USW-16-PoE/Default LAN), Vertical desktop (wired, static, no ACG login). CSCNet confirmed as shared PPSK SSID (not simple staff/VLAN-20). GuruRMM recon: desktop RDP-only (not a PBX); CS-QB SMB-only/no SIP; phones likely cloud PBX. Designed VLAN 30 VOICE (10.0.30.0/24, isolated, internet-only egress); wrote cutover runbook (`docs/network/voice-vlan-cutover.md`); vendor email sent. Awaiting Richard's confirm + window. |
|
||||
| 2026-06-16 | **pfSense confirmed as pfSense Plus 25.07-RELEASE; health verified; home-LAN shadow resolved.** Howard-Home renumbered from 192.168.0.0/24 to 10.137.42.0/24 (removed collision with Cascades 192.168.0.0/24). pfSense now reachable from Howard-Home over the site VPN. SSH health check: DHCP not exhausted, DNS up, WAN stable, states 28-31k/790k, load 0.6 -- gateway ruled out as WiFi factor. `pfsense-ssh.sh` backend built and validated live (SSH, no RESTAPI package needed). |
|
||||
| 2026-06-16 | **Floor-4 2.4 GHz power-down pilot applied (first production RF change).** 14/15 Floor-4 radios set to 6 dBm (from ~23); avg retry 13.2->9.5% (~28% fewer retransmits); clients retained, no coverage loss. AP 445 lagged (left alone, harmless). AP-hang recovery procedure learned: `device-control poe-cycle` (NOT force-provision -- took 445 offline; removed from the tool). `dfs-check.sh` confirmed ZERO real radar events fleet-wide (DFS empirically clean). `unifi-wifi` skill feature-complete (WiFi monitor/tune/apply + switch/gateway/pfSense-SSH + multi-client + channel-plan + cron health). |
|
||||
| 2026-06-17 | **KPI dashboard scoping for Ashley Jensen (advisory; no infra touched).** Reframed her Power BI Gateway question (gateway is on-prem-only, not a SaaS connector). Catalogued the 9 reporting systems (ALIS/QuickBooks/Bill.com/Relias/You've Got Leads/TELS/Focus HR/Helpany/POS). Recommended Phase 1 (exports->SharePoint->Power BI Pro) -> Phase 2 (Power Automate for Bill.com/QBO), leveraging existing M365 Business Premium. Wrote internal scoping note + client-facing one-pager (with cost line) under `docs/proposals/`. Parked pending Ashley's KPIs + freshness + POS/Focus-HR specifics. |
|
||||
| 2026-06-17 | **Voice VLAN 30 built + verified; Vertical desktop + initial Poly phones migrated.** Richard Turner confirmed window; VLAN 30 pfSense interface (igc1.30, 10.0.30.0/24) + isolation rules built (clone of GUEST VLAN, Protocol=Any; verified via `pfctl -sr`). UniFi VOICE network + CSCNet voice PPSK created (vaulted). Vertical desktop migrated (port-16 bounce via controller API with CSRF token; re-DHCP'd to 10.0.30.201). Key learnings: desktop is DHCP (not static), Vertical uses LogMeIn (not pfSense OpenVPN), re-VLAN wired port requires link bounce. |
|
||||
| 2026-06-17 | **Poly phone drops root-caused and closed; whole-network smoothing plan built (dry-run only).** Phone drops = intentional pfSense reboot on 2026-06-16 22:38 MST (transient, one-time, resolved). DHCP server healthy (1241 ACK/1 NAK/0 no-free-leases, read directly from plain-text log). Per-room /28 HIPAA segmentation confirmed intentional + healthy. Produced prioritized smoothing plan (Phase 0 + Phase 1 radio_table PUT). Nothing applied. Hard rule established: no Cascades prod-infra changes without discussing + explicit per-change go. |
|
||||
| 2026-06-17 | **CS-SERVER drive review (advisory).** Confirmed surviving drive: Hitachi HTS545032B9A300 (0:0:2, 320 GB SATA, 5400 RPM). Recommended replacement drives: 2x 480 GB enterprise SATA SSD (e.g. Solidigm D3-S4520 or Samsung PM893), hot-swap on R610 backplane. No server-side commands run; gated on backup verification. |
|
||||
| 2026-06-17 | **Power outage -- full site down + recovery.** All 77 APs + 12 switches disconnected. Root cause: pfSense on UPS surge-only side (no battery) -> unclean shutdown -> ZFS OK, but duplicate dhcpd + 2nd-floor switch one-way L2 forwarding. Howard: killed duplicate dhcpd + clean restart remotely. Mike: moved pfSense to battery outlets, restored config from on-box auto-backup (VLAN30 intact), reset+re-adopted Switch 2nd Floor #2 (USL24PB), rebooted Cox modem. Network fully restored. Separately: 5GHz auto-channel applied (co-channel 25->30, worse). pfSense config vaulted. Pre-existing plaintext Synology signin credential found in vault history (commit 1fbc0e1) -- encrypted go-forward; needs rotation. Incident report: `clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md`. |
|
||||
| 2026-06-18 | **Power outage follow-ups: OpenVPN flapping root-caused; kitchen printer casualty resolved.** OpenVPN disconnect/reconnect cycle = configured `--inactive` idle timeout (~300s) on the pfSense server, not a fault. Fix proposed (raise/disable); not applied. Kitchen thermal printer (iPad POS) would not print post-outage -- booted during DHCP-down window, cached disconnected state; fixed by power-cycle. DHCP straggler sweep: 13/13 active senders completing, 0 stuck. |
|
||||
| 2026-06-18 | **Synology Drive sync architecture diagnosed; Team Folder migration plan produced.** Current Drive sync is Sync-user My Drive only (not the real shared folders). Real NAS shares (Server 1.9 G, Management 5.5 G, Public ~50 G, SalesDept ~23 G) are not mirrored. Plan: Team Folder Download-only tasks into `D:\Shares\_SynMigration\<share>` staging; pilot on `/volume1/Server`. No changes made. |
|
||||
| 2026-06-18 | **DESKTOP-TRCIEJA (Lupe Sanchez) performance diagnosed; replace-not-remediate decision.** Root causes: (a) EOL hardware -- Gateway ZX6971 AIO, Intel i3-2120 (2011, 2C/4T), 8 GB RAM, Win11 unsupported; (b) dual real-time AV -- ACG Bitdefender (keep) + leftover Datto stack (Datto RMM/CentraStage + Datto EDR/Infocyte + bundled DattoAV) both scanning every file on a 2-core CPU under memory pressure. OneDrive ruled out (desktop is local). Howard decided: no remediation; order replacement. Another instance of the fleet-wide leftover-Datto-stack cleanup. |
|
||||
| 2026-06-18 | **Voice VLAN 30: all 22 Poly phones migrated; network-logging spec written.** Completed the Poly cutover live -- all 22 WiFi phones re-keyed to the voice PPSK onto `10.0.30.202-.223` (per-phone location inventory in `docs/network/voice-phone-inventory.md`); first phone (Lauren Hasselman) dial-tone + outbound call verified. Vertical desktop fixed via port-16 bounce (controller API + CSRF) -> `10.0.30.201`. AudioCodes (8, wired) still pending (flip + PoE power-cycle). Separately, found the UniFi controller retains **ZERO** client events for Cascades (drop/kick history not captured) -> wrote a network-logging spec (`docs/network/network-logging-plan.md`): Synology Log Center on-site collector, pfSense+UniFi syslog sources, client snapshotter. Plan only -- build later. |
|
||||
| 2026-06-18 | **Voice VLAN 30 cutover COMPLETE (8 AudioCodes added); voice-quality diagnosed; holistic all-device optimization master plan built.** AudioCodes finished -- they wouldn't re-DHCP via PoE/controller bounce (externally powered, PoE off); Howard physically power-cycled all 8 -> VOICE leases `.224-.231` (31 devices total on VLAN 30). Diagnosed the dropped-calls complaints: **the VLAN move does NOT fix call quality -- it's RF on the Poly WiFi phones** (wired AudioCodes clean). 14 Poly flagged; worst Lauren `.202` (2.4GHz/50% retry -> locked to AP 103) + Shelby `.218` (2.4GHz/53%, MemCare/deferred); coverage gaps rooms 515/210/204; found 6 unmigrated Poly stragglers (fleet is 28, not 22). Built `network-optimization-master-plan.md` (open-relief-valves-before-constraining sequence: QoS -> 6 GHz on CSCNet + 2.4 Low->Medium -> 5 GHz 40 MHz/non-DFS/relieve AP 103 -> fine-tune -> physical) with interdependency map + data-driven gate framework, floors 1-4 only. Designed Phase 1 voice QoS (`phase1-voice-qos-design.md`: pfSense HFSC + UniFi WMM, match `10.0.30.0/24`, phones mark DSCP EF; measured WAN1 up ~522 Mbps -> QoS is insurance, RF is the substance). Rigorous DFS re-verification (0 genuine radar/~1-day window) -> **decision: NON-DFS only**. **Decision: no dedicated voice SSID** (3-SSID cap, CSC ENT still 131 clients, QoS is SSID-independent). 6 GHz root-caused dark: CSCNet not broadcasting 6g. NO live network changes applied (per-change-go rule). |
|
||||
| 2026-06-19 | **FIRST PRODUCTION RF OPTIMIZATION applied (autonomous 2 AM window) -- 2.4 power fix + data-driven 5 GHz DFS plan; 5 GHz retry HALVED.** Howard pre-authorized an autonomous 2 AM run. Applied + validated + KEPT: (1) **2.4 power Low/full -> MEDIUM on 47 radios** (over-thinning fix floors 1-4 + MemCare 5/6 off full power; 24 disabled stayed disabled; per-AP targeting since `--zone` re-enables disabled), non-regressive. (2) **CSCNet BSS-transition ON.** 6 GHz attempted but **BLOCKED -- `Wpa3MandatoryFor6GHzBand`** (CSCNet is WPA2/PPSK; converting the 427-client SSID is a supervised decision, deferred to Howard). A first blind non-DFS 5 GHz reshuffle (3a/3b) was tried, did NOT validate (flat retry, voice scattered to 2.4), and was ROLLED BACK. **Howard's correction: scan FIRST, decide from data.** Completed the full channel survey (74/74) -> proved **DFS channels here are 4-5x cleaner (2-3% busy) than non-DFS (ch149=12%, ch157=28%)**; the non-DFS-only decision was reversed. Built a **data-driven clean-DFS plan** (8 clean DFS 40MHz channels, per-AP cleanest + neighbor graph-color + local-search -> 0 co-channel), applied to 72 non-mesh APs (mesh excluded), nudged voice back to 5 GHz. **Result: 5 GHz retry 8.7 -> 3.8 avg (median 8.2 -> 2.1), satisfaction median 99, voice 31/31, all 72 APs holding DFS, 0 radar vacates.** Also disabled the 3 AM AP auto-upgrade (left OFF). **Skill hardened:** added `survey-report.py` (fleet channel-congestion analysis) + made `channel-plan.sh` palette data-driven (`--channels`/`--dfs`, load-balance + local-search) -- killed the non-DFS bias that caused the first failed attempt. |
|
||||
| 2026-06-19 | **Voice VLAN migration COMPLETE (29/29 Poly) + band-selection diagnosis + Vertical 5 GHz handoff.** Howard walked the building and re-keyed all remaining Poly handsets to the voice PPSK -- the 6 stragglers found 6/18 + 2 added onsite: Zachary Nelson .232, Recreation room .233, Movie Theater .234, Library .235, Bistro .236, John Trozzi rm422 .237, Kitchen server. Full named 38-device roster in `voice-phone-inventory.md` (29 Poly + 8 AudioCodes + Vertical desktop). Per-phone re-look (goal = clean calls, not fleet averages): most phones fine on the clean 5 GHz (Lauren .202 went 2.4/50% -> 5GHz/12%), but several stuck on 2.4 despite -50 to -60 dBm signal at 36-96% retry -- a **band-selection problem, not RF**; controller band-steering (already ON) isn't holding the Poly OUI on 5 GHz. Fix is phone-side: **5 GHz-only lock via Vertical** -- letter sent to Richard Turner (`docs/network/2026-06-19-vertical-5ghz-lock-request.md`), awaiting their response = the last voice item. Also: confirmed (data) the 2.4 channel re-plan is NOT a lever (every 2.4 channel 84-91% busy externally); GOTCHA logged: verify VLAN via the client `vlan` field, not the controller's cached IP (Kitchen-server phone read stale). Self-check GREEN (pulled b668430 baseline fixes; installed dev-alerts post-commit hook). |
|
||||
| 2026-05-26 | Access control vendor meeting onsite (ticket #32324). 0.5h Howard + 0.5h Mike billed. Block at 28.0h. |
|
||||
| 2026-06-03 | ALIS AADSTS65001 diagnosed and resolved: granted tenant-wide admin consent on ALIS SP `e1cae4ad`. Caregiver device allow-list CA policy created in report-only (`1b7fd025`). |
|
||||
| 2026-06-04 | Three same-day tickets: #32381 Tamra scanner (0.5h onsite), #32382 Megan file access (1.5h onsite), #32383 Chris Knight bill.com/BOK email delivery (1.5h remote). Root cause sender-side. |
|
||||
| 2026-06-05 | NURSESTATION-PC localadmin login-screen issue resolved. Caregiver test rig built. Hybrid Entra Join + GPOs deployed: `CSC - Caregiver Workstation` validated; `CSC - Caregiver Device Lockdown` deployed to `OU=Caregiver Devices`. Ticket #32303 billed 7.0h, invoice #67782 ($0.00 prepaid). |
|
||||
| 2026-06-08 | **Chris Knight workstation setup (onsite).** DESKTOP-N5G1ROO domain-joined + GuruRMM-enrolled. **MAJOR: root-caused native Folder Redirection failure** -- FR GPO targets were in misnamed `fdeploy1.ini`; fixed by writing correct `fdeploy.ini` + version bump. **ASSISTNURSE-PC reinstalled (Win10->Win11).** Edge UNC download bug diagnosed (no fix applied). |
|
||||
| 2026-06-09 | **Accounting scan-to-folder built.** `D:\Shares\Accounting` on CS-SERVER; shared as `\\CS-SERVER\AcctDept`; `svc-scan` service account vaulted; Brother MFC-L8900CDW Scan-to-Network configured (NTLMv2, confirmed). Persistent drive maps set (Chris Y:, Zachary Y:, Lauren X:). |
|
||||
| 2026-06-10 | **Meredith Kuhn locked Word doc -- stale owner files on cascadesDS.** Five orphaned `~$` files deleted via RMM in Meredith's user session. Ticket #32403, 0.5h remote, block 56.75->56.25. |
|
||||
| 2026-06-12 | **Created shared mailboxes grievances@ + Surveys@ and delegated to Meredith & Ashley.** All 8 permission grants verified. Ticket #32417, 0.5h remote, block 56.25->55.75. |
|
||||
| 2026-06-15 | **Wireless RF full audit -- controller access gained.** Mike vaulted SSH key + RW admin + AP SSH. Live audit confirmed 77 U7-Pro APs, ~574->587 clients, 2.4 GHz saturation as primary pain band. |
|
||||
| 2026-06-15 | **CS-SERVER slowness root-caused to degraded RAID-1; cloud backup started; pfSense OpenVPN password reset.** PD 0:0:3 (320 GB WD SATA) Critical/Removed; C: on single 320 GB Hitachi 5400 RPM spindle. MSP360/CloudBerry cloud backup installed on CS-SERVER (closes HIPAA backup gap). |
|
||||
| 2026-06-16 | **Voice VLAN plan for Vertical phones (PLANNED, not executed).** Designed VLAN 30 VOICE (10.0.30.0/24, isolated, internet-only egress); cutover runbook written. Floor-4 2.4 GHz power-down pilot applied (first production RF change): 14/15 radios to 6 dBm, retry 13.2->9.5%. `dfs-check.sh` confirmed ZERO real radar events fleet-wide. `unifi-wifi` skill feature-complete. |
|
||||
| 2026-06-16 | **pfSense confirmed as pfSense Plus 25.07-RELEASE; health verified; Howard-Home LAN renumbered** (192.168.0.0/24 -> 10.137.42.0/24; removed collision with Cascades). `pfsense-ssh.sh` built and validated. |
|
||||
| 2026-06-17 | **Voice VLAN 30 built + verified; Vertical desktop + initial Poly phones migrated.** Richard Turner confirmed window; pfSense igc1.30 interface + isolation rules built. Vertical desktop migrated (port-16 bounce via controller API + CSRF); key learnings: desktop is DHCP, Vertical uses LogMeIn. |
|
||||
| 2026-06-17 | **Power outage -- full site down + recovery.** pfSense on UPS surge-only side -> unclean shutdown -> duplicate dhcpd + 2nd-floor switch one-way L2. Howard killed duplicate dhcpd; Mike moved pfSense to battery, restored on-box config, reset+re-adopted Switch 2nd Floor #2, rebooted Cox modem. 5GHz auto-channel applied (co-channel 25->30, worse). pfSense config vaulted. Pre-existing plaintext Synology signin credential found (vault history commit 1fbc0e1). |
|
||||
| 2026-06-17 | **KPI dashboard scoping (advisory).** 9 reporting systems catalogued. Recommended Phase 1 (exports->SharePoint->Power BI Pro). Proposals drafted. Parked pending Ashley's KPIs. |
|
||||
| 2026-06-18 | **Voice VLAN 30 cutover COMPLETE (8 AudioCodes added; 22 Poly done).** AudioCodes required physical power-cycle (externally powered, PoE bounce was no-op). Per-phone diagnosis: dropped-calls are RF (band selection), not VLAN. 6 GHz root-caused dark (CSCNet not broadcasting 6g). Holistic optimization master plan built. |
|
||||
| 2026-06-18 | **DESKTOP-TRCIEJA (Lupe Sanchez) perf diagnosed; replace decision.** Root causes: EOL hardware (i3-2120) + dual real-time AV (Bitdefender + leftover Datto stack). |
|
||||
| 2026-06-18 | **Synology Drive sync architecture diagnosed.** Current scope: Sync-user My Drive only; real shared folders NOT mirrored. Team Folder migration plan produced. |
|
||||
| 2026-06-18 | **Power outage follow-ups: OpenVPN flapping root-caused (--inactive timeout, not a fault); kitchen printer straggler resolved by power-cycle.** |
|
||||
| 2026-06-19 | **PRODUCTION RF OPTIMIZATION APPLIED (autonomous 2 AM window) -- 5 GHz retry HALVED.** 2.4 power -> MEDIUM on 47 radios (over-thinning fix + MemCare off full power; per-AP targeting). CSCNet BSS-transition ON. 6 GHz attempted but BLOCKED (`Wpa3MandatoryFor6GHzBand`). Blind non-DFS 5 GHz reshuffle tried, failed, rolled back. Howard's correction: scan FIRST, decide from data. Full channel survey (74/74 APs) proved DFS channels here 4-5x cleaner (2-3%) than non-DFS (ch149=12%, ch157=28%). Data-driven clean-DFS plan (8 DFS 40MHz channels, per-AP cleanest + neighbor graph-color, 0 co-channel) applied to 72 non-mesh APs. **Result: 5 GHz retry 8.7->3.8 avg (median 8.2->2.1), satisfaction median 99, all 72 APs holding DFS, 0 radar vacates.** `survey-report.py` added; `channel-plan.sh` made data-driven. |
|
||||
| 2026-06-19 | **Voice VLAN migration COMPLETE (29/29 Poly) + band-selection diagnosis + Vertical 5 GHz handoff.** Howard walked the building, re-keyed all remaining Poly handsets to voice PPSK. Per-phone re-look: most phones on clean 5 GHz (Lauren .202: 2.4/50% -> 5GHz/12%), but several stuck on 2.4 despite -50 to -60 dBm signal -- controller band-steering not holding Poly OUI on 5 GHz. Phone-side fix: **5 GHz-only lock request sent to Richard Turner (Vertical)**, awaiting response = the last voice item. Kitchen server phone bad (pulled by John); Bistro phone relocated to Kitchen; Bistro now has no phone (replacement pending). Billed ticket #32444 (7h: 4 onsite + 3 remote), block 55.75->48.75. |
|
||||
|
||||
---
|
||||
|
||||
## Compilation Notes
|
||||
|
||||
**Session logs read:** all prior sessions + 2026-06-17/18 logs: voice VLAN 30 build + Poly cutover, Poly phone-drop root cause + wireless smoothing plan, power-outage recovery + 5GHz option analysis, CS-SERVER drive review, KPI dashboard scoping, power-outage follow-up (OpenVPN + printer), Synology Drive sync diagnosis, DESKTOP-TRCIEJA (Lupe Sanchez) perf diagnosis. Date range: 2026-03-06 through 2026-06-18.
|
||||
|
||||
**Full recompile addendum (2026-06-18 -- recovered-docs fold-in):** folded in the 4 docs restored after the repo-rewrite (master plan, voice QoS design, voice-quality diagnostic, RF/voice-optimization session log). Key corrections + additions: **Voice VLAN 30 cutover is COMPLETE** (8 AudioCodes `.224-.231` added -- prior compile had them 0/8 pending); AudioCodes physical-power-cycle gotcha; Poly fleet is 28 (6 stragglers off VOICE); voice quality is an RF problem (per-phone diagnostic); 6 GHz dark because CSCNet isn't broadcasting 6g; AP 103 5 GHz saturation; measured WAN1 upload ~522 Mbps (QoS = insurance); new Patterns subsections (Voice QoS design, Network Optimization Master Plan, Decisions-resolved-2026-06-18: non-DFS-only + no-voice-SSID); Active Work + History + HIPAA reconciled to the complete cutover.
|
||||
|
||||
**Prior compile (2026-06-18, refresh + initial):**
|
||||
- Voice VLAN 30: status updated to 22/22 Poly + desktop DONE; AudioCodes were 0/8 pending at that point (now complete -- see addendum). PPSK vaulted. Wired-port link-bounce pattern documented.
|
||||
- Power outage (2026-06-17): full incident documented. pfSense UPS placement rectified. Duplicate dhcpd, 2nd-floor switch L2 failure, Cox modem reboot step. Post-outage straggler pattern (power-cycle) documented. pfSense config vaulted. Synology signin credential exposure flagged (vault commit 1fbc0e1).
|
||||
- Wireless: Phase A extended overnight 2026-06-17 and over-thinned (retry 17->23.4%, satisfaction 39->30). 5GHz auto-channel made co-channel overlap worse. Both corrective plans staged (Low->Medium, Option B) but not applied. Phone drop mystery closed (intentional pfSense reboot).
|
||||
- DESKTOP-TRCIEJA (Lupe Sanchez): added to key contacts and migration table; EOL hardware + dual-AV root cause; replace decision.
|
||||
- pfSense patterns: plain-text logs (not clog), --inactive OpenVPN timeout, dirty-boot/duplicate-dhcpd recovery, post-outage stragglers.
|
||||
- Synology Drive sync architecture documented (current scope is Sync-user My Drive only; Team Folder migration plan for department shares).
|
||||
- Active Work: updated with all new non-Syncro follow-ups (Lupe replacement, Synology credential rotation, AudioCodes cutover, Phase 0+1 wireless, OpenVPN fix, AutoConfigBackup, Team Folder migration, outage stragglers).
|
||||
- Sources: 7 new session-log paths appended (2026-06-17-poly-phone-drops, 2026-06-17-power-outage, 2026-06-17-cs-server-drive-review, 2026-06-17-voice-vlan30-build, 2026-06-18-outage-followup-openvpn-printer, 2026-06-18-synology-drive-sync, 2026-06-18-lupesanchez-perf-diag); voice-phone-inventory.md added.
|
||||
- Billing: hours 55.75 (unchanged; no draws from 2026-06-17/18 sessions); date updated to 2026-06-18.
|
||||
**2026-06-20 recompile (GURU-5070/claude-main) changes vs. prior (2026-06-19, HOWARD-HOME):**
|
||||
- Billing updated: 48.75 hrs as of 2026-06-20 (Syncro authoritative); ticket #32444 (7h) reflected in block balance and ticket list.
|
||||
- Infrastructure > Network > Wireless RF section updated: replaced stale "OVER-THINNED (as of 2026-06-17)" and "NOT applied (pending go-ahead)" narrative with the actual applied 2026-06-19 state (2.4 Medium, 5 GHz clean DFS 40MHz, results).
|
||||
- Patterns > Wireless: replaced stale "Remediation status (as of 2026-06-17 -- OVER-THINNED)" block with "APPLIED 2026-06-19" block; removed Phase C disable list (advisory, superseded by current state); removed stale "non-DFS only recommended" text from 5 GHz line.
|
||||
- Active Work: removed stale "Wireless RF Phase 0 + Phase 1 (pending go-ahead)" item (executed); updated master plan item (P2b and P3 done, remaining P1/P4/P5 and 6GHz deferred); added new RF follow-ups (re-enable auto-upgrade, DFS radar monitor, MemCare min-RSSI, 6GHz deferred/Howard decision).
|
||||
- All other sections preserved verbatim from prior compile.
|
||||
|
||||
**Client folder:** `clients/cascades-tucson/` (NOT `clients/cascades/` -- that directory does not exist).
|
||||
|
||||
**Open items flagged as unverified:**
|
||||
- Break-glass accounts + YubiKeys -- confirmed not created as of 2026-05-27; YubiKey arrival unconfirmed
|
||||
- Audit retention infra -- approved 2026-04-29, not yet built
|
||||
- dunedolly21@gmail.com guest invite -- confirm with Lauren
|
||||
- Windows MDM auto-enroll scope -- confirm in portal (Entra -> Devices -> Mobility -> Microsoft Intune -> MDM user scope)
|
||||
- #32370 -- verify/likely closed; Syncro live 2026-06-18 shows 0 open tickets
|
||||
- Edge UNC download bug fix path -- no fix applied as of 2026-06-08; decision pending Howard
|
||||
- ALIS BAA with Medtelligent -- not yet verified; confirm with Meredith (also: does ALIS offer a built-in analytics / data feed? relevant to the KPI dashboard)
|
||||
- KPI dashboard (Ashley Jensen) -- parked; need day-one KPIs, data-freshness need, POS product + Focus HR plan before scoping a build
|
||||
- JD Martin (jd.martin@cascadestucson.com) -- confirmed Syncro contact; role not yet documented
|
||||
- CS-SERVER cloud backup: verify first full completes, confirm image-based / bare-metal + system-state, set retention; only then proceed with RAID remediation
|
||||
- NURSESTATION-PC: verify `CSC - Caregiver Device Lockdown` GPO activated (requires reboot; verify lock@3min, 90s warning, sign-out@15min, never-sleep)
|
||||
- Wireless RF: Phase 0 + Phase 1 (Low->Medium + Option B) pending scope go-ahead from Howard; windowed evening session needed
|
||||
- Christine (room 515, Poly phone 10.0.30.220) -- last name noted as "~Nyuda -- VERIFY"
|
||||
- UPS coverage: confirm all core infra + PoE switches are battery-backed (pfSense rectified; others not confirmed)
|
||||
- Netgate AutoConfigBackup: not yet enabled
|
||||
- Synology signin credential rotation: exposed in vault history commit 1fbc0e1; encrypted go-forward but must rotate
|
||||
|
||||
**Resolved since last compile (2026-06-17 -> 2026-06-18):**
|
||||
- Poly phone drops: closed (intentional 2026-06-16 pfSense reboot; transient)
|
||||
- Voice VLAN 30: cutover COMPLETE -- 8 AudioCodes (`.224-.231`) + 22 Poly + Vertical desktop = 31 devices on VOICE (6 Poly stragglers remain off VOICE)
|
||||
- Voice-quality root cause: identified as RF on the WiFi Poly phones (not the VLAN move); per-phone diagnostic produced
|
||||
- 6 GHz dark: root-caused (CSCNet `wlan_bands=[2g,5g]` -- not broadcasting 6g)
|
||||
- 5 GHz DFS question: RESOLVED -- non-DFS only (resilience near Davis-Monthan/TUS)
|
||||
- Dedicated voice SSID question: RESOLVED -- no (shared CSCNet; QoS is SSID-independent)
|
||||
- pfSense 25.07 log format: documented (plain text, not clog)
|
||||
- pfSense config backup: vaulted post-restore (2026-06-17)
|
||||
- pfSense on battery-backed UPS: rectified (Mike, 2026-06-17)
|
||||
- Kitchen printer / post-outage straggler: resolved by power-cycle (2026-06-18)
|
||||
- Synology Drive sync architecture: diagnosed (2026-06-18; Team Folder plan produced)
|
||||
- DESKTOP-TRCIEJA root cause: identified (dual AV + EOL hardware); decision made (replace)
|
||||
---
|
||||
|
||||
## Backlinks
|
||||
|
||||
|
||||
@@ -2,8 +2,8 @@
|
||||
type: client
|
||||
name: dataforth
|
||||
display_name: Dataforth Corporation
|
||||
last_compiled: 2026-06-04
|
||||
compiled_by: DESKTOP-0O8A1RL/claude-main
|
||||
last_compiled: 2026-06-20
|
||||
compiled_by: GURU-5070/claude-main
|
||||
sources:
|
||||
- clients/dataforth/docs/overview.md
|
||||
- clients/dataforth/docs/active-directory.md
|
||||
@@ -13,6 +13,25 @@ sources:
|
||||
- clients/dataforth/docs/SYNC_SCRIPT_UPDATE_SUMMARY.md
|
||||
- clients/dataforth/docs/incident-2026-03-27-abuse-report-virtuo.md
|
||||
- clients/dataforth/docs/incident-2026-03-27-abuse-report-connectwise.md
|
||||
- clients/dataforth/docs/cloud/m365.md
|
||||
- clients/dataforth/docs/issues/log.md
|
||||
- clients/dataforth/docs/network/topology.md
|
||||
- clients/dataforth/docs/network/vlans.md
|
||||
- clients/dataforth/docs/network/firewall.md
|
||||
- clients/dataforth/docs/rmm/rmm.md
|
||||
- clients/dataforth/docs/security/antivirus.md
|
||||
- clients/dataforth/docs/security/backup.md
|
||||
- clients/dataforth/docs/servers/ad1.md
|
||||
- clients/dataforth/docs/servers/ad2.md
|
||||
- clients/dataforth/docs/servers/d2testnas.md
|
||||
- clients/dataforth/docs/servers/df-hyperv-b.md
|
||||
- clients/dataforth/docs/servers/files-d1.md
|
||||
- clients/dataforth/docs/servers/sage-sql.md
|
||||
- clients/dataforth/docs/projects/shares-permissions/roadmap.md
|
||||
- clients/dataforth/docs/projects/shares-permissions/current-state-2026-06-10.md
|
||||
- clients/dataforth/docs/projects/shares-permissions/acl-audit-detail-2026-06-10.md
|
||||
- clients/dataforth/docs/projects/shares-permissions/discovery-email-draft.md
|
||||
- clients/dataforth/docs/aoi-xp-vlan-backup-runbook.md
|
||||
- clients/dataforth/session-logs/2026-03-23-galactic-advisors-report.md
|
||||
- clients/dataforth/session-logs/2026-03-27-security-incident-mfa-datasheets.md
|
||||
- clients/dataforth/session-logs/SESSION-SUMMARY.md
|
||||
@@ -25,11 +44,20 @@ sources:
|
||||
- clients/dataforth/session-logs/2026-05-04-lobby-phone-vlan-fix.md
|
||||
- clients/dataforth/session-logs/2026-05-06-session.md
|
||||
- clients/dataforth/session-logs/2026-05-12-session.md
|
||||
- clients/dataforth/session-logs/2026-06-01-aoi-xp-vlan-share.md
|
||||
- clients/dataforth/session-logs/2026-06-01-cbell-m365-bobbi-outlook.md
|
||||
- clients/dataforth/session-logs/2026-06-02-session.md
|
||||
- clients/dataforth/session-logs/2026-06-04-session.md
|
||||
- clients/dataforth/session-logs/project_ad2_context.md
|
||||
- clients/dataforth/session-logs/project_pipeline_rebuilt.md
|
||||
- clients/dataforth/session-logs/project_test_datasheet_pipeline.md
|
||||
- clients/dataforth/session-logs/project_new_product_lines.md
|
||||
- clients/dataforth/migration-gap-diff-RESUME.md
|
||||
- clients/dataforth/CLAUDE.dataforth.md
|
||||
- projects/dataforth-dos/CONTEXT.md
|
||||
- session-logs/2026-06-05-session.md
|
||||
- session-logs/2026-06/2026-06-09-mike-dataforth-freepbx-safesite-forensics.md
|
||||
- session-logs/2026-06/2026-06-18-mike-testdatadb-render-and-security-app.md
|
||||
- .claude/memory/project_dataforth_incident_2026-03-27.md
|
||||
- .claude/memory/project_datasheet_pipeline.md
|
||||
- .claude/memory/project_neptune_sbr_email_routing.md
|
||||
@@ -37,12 +65,11 @@ sources:
|
||||
- .claude/memory/reference_neptune_access_d2testnas.md
|
||||
- .claude/memory/feedback_d2testnas_ssh.md
|
||||
- .claude/memory/infra_office_network.md
|
||||
- clients/dataforth/session-logs/2026-06-01-aoi-xp-vlan-share.md
|
||||
- clients/dataforth/docs/aoi-xp-vlan-backup-runbook.md
|
||||
- clients/dataforth/session-logs/2026-06-01-cbell-m365-bobbi-outlook.md
|
||||
- clients/dataforth/session-logs/2026-06-02-session.md
|
||||
- clients/dataforth/session-logs/2026-06-04-session.md
|
||||
- clients/dataforth/migration-gap-diff-RESUME.md
|
||||
- .claude/memory/project_dataforth.md
|
||||
- .claude/memory/project_dataforth_history.md
|
||||
- .claude/memory/project_ad2_dataforth_fork.md
|
||||
- .claude/memory/ad2-ssh-mtu-blackhole.md
|
||||
- .claude/memory/ad2-comms-via-sync-only.md
|
||||
backlinks:
|
||||
- projects/dataforth-dos
|
||||
- systems/jupiter
|
||||
@@ -50,7 +77,7 @@ backlinks:
|
||||
|
||||
# Dataforth Corporation
|
||||
|
||||
Signal conditioning / data acquisition manufacturer in Tucson, AZ. Long-standing ACG client. Active managed relationship — monthly prepaid block. Notable for 64 MS-DOS 6.22 test stations, a major security incident in March 2026, an ongoing test datasheet pipeline modernization project, and an incomplete 2025 post-ransomware recovery restore that silently dropped files across multiple shares (active audit underway).
|
||||
Signal conditioning / data acquisition manufacturer in Tucson, AZ. Long-standing ACG client. Active managed relationship — monthly prepaid block. Notable for 64 MS-DOS 6.22 test stations, a major security incident in March 2026, an ongoing test datasheet pipeline modernization project, an incomplete 2025 post-ransomware recovery restore that silently dropped files across multiple shares (active audit underway), and a new shares/permissions remediation project (Phase 1 pending client input as of 2026-06-19).
|
||||
|
||||
---
|
||||
|
||||
@@ -76,8 +103,10 @@ Signal conditioning / data acquisition manufacturer in Tucson, AZ. Long-standing
|
||||
|
||||
- **External distributor:** Ginger (gy@quatronix-cn.com) — Quatronix China; receives datasheets
|
||||
- **Billing rate:** Prepaid block; all invoices show $0.00 — hours drawn from block
|
||||
- **Hours remaining:** 34.5 hrs as of 2026-06-04 (after 1.0 hr billed for SP1366 file recovery, ticket #32385). Always live-check Syncro before billing — `GET /customers/578095`.
|
||||
- **Hours remaining:** 31.5 hrs as of 2026-06-19 (live-check Syncro before billing — `GET /customers/578095`)
|
||||
- **Syncro customer ID:** 578095
|
||||
- **Syncro managed assets:** 50
|
||||
- **Open Syncro tickets:** 0 as of 2026-06-19
|
||||
- **Invoice CC:** jantar@dataforth.com
|
||||
|
||||
---
|
||||
@@ -88,18 +117,18 @@ Signal conditioning / data acquisition manufacturer in Tucson, AZ. Long-standing
|
||||
|
||||
| Host | IP | Role | OS | Notes |
|
||||
|---|---|---|---|---|
|
||||
| AD1 | 192.168.0.27 | Primary DC, DNS, FSMO roles, Engineering share | Windows Server 2016 | C:\ at **90%** capacity (C:\Engineering = 787 GB) — critical risk. FSMO roles (assumed all). GuruRMM agent `bf7bc5ee-4167-4a62-912a-c88b11a5943d`. Only `Image2025` backup plan — Files plan pending. |
|
||||
| AD2 | 192.168.0.6 | Secondary DC, TestDataDB service host, NAS mirror, WebShare | Windows Server 2022 | Hosts testdatadb Node.js service on :3000. Wiped by crypto attack 2025 — rebuilt. Windows Firewall disabled (all profiles). Shares: `C:\Shares\{c-drive,e-drive,webshare}`. Old `D:\c-drive` data volume is GONE — D: is now a mounted Windows install ISO. MSP360 agent at `C:\Program Files\Arizona Computer Guru\Online Backup\cbb.exe`; storage account `ACG-Dataforth`. GuruRMM agent `cfa93bb6-0cdc-4d4e-a29e-1609cda6f047`. No shadow copies. |
|
||||
| FILES-D1 | — | File server | — | Shares: `E:\Shares\{sales,archive}`. GuruRMM agent `8566a19d-49a9-4f8b-9c6c-012cc934484b`. **NOTE: `staff` share is missing** on FILES-D1 — separate issue. |
|
||||
| SAGE-SQL | 192.168.0.153 | Sage ERP (S:), RDS Session Host/Connection Broker/Web Access | Windows Server | RDS licensing grace period was expired (reset 2026-05-06). TSGateway disabled (server not externally exposed). New self-signed RDS cert installed. Bitdefender GravityZone managed AV. Share: `C:\sage`. GuruRMM agent `120ba7bf-8544-48a0-98a1-40ed5cdd3e1f`. |
|
||||
| 3CX | 192.168.0.125 | Phone system | — | Last logon Oct 2025 — possibly inactive |
|
||||
| DF-HYPERV-B | — | Hyper-V hypervisor | — | GuruRMM enrolled (agent ID — see GuruRMM fleet below) |
|
||||
| AD1 | 192.168.0.27 | Primary DC, DNS, FSMO roles, Engineering share | Windows Server 2016 | C:\ at **90%** capacity (C:\Engineering = 787 GB) — critical risk. FSMO roles (assumed all). GuruRMM agent `bf7bc5ee-4167-4a62-912a-c88b11a5943d`. Image plan (`Image2025`) + Files plan (NBF, daily 2 AM, 180-day retention — created 2026-06-05). |
|
||||
| AD2 | 192.168.0.6 | Secondary DC, TestDataDB service host, NAS mirror, WebShare | Windows Server 2022 | Hosts testdatadb Node.js service on :3000. Wiped by crypto attack 2025 — rebuilt. Windows Firewall disabled (all profiles). Shares: `C:\Shares\{c-drive,e-drive,webshare,test}`. Old `D:\c-drive` data volume is GONE — D: is now a mounted Windows install ISO. MSP360 agent at `C:\Program Files\Arizona Computer Guru\Online Backup\cbb.exe`; storage account `ACG-Dataforth`. GuruRMM agent `cfa93bb6-0cdc-4d4e-a29e-1609cda6f047`. No shadow copies. Runs ClaudeTools on `ad2` branch (coord-API isolated; comms via git sync only). |
|
||||
| FILES-D1 | 192.168.0.189 | File server | Windows Server 2016 | Shares: `E:\Shares\{sales,archive}`. GuruRMM agent `8566a19d-49a9-4f8b-9c6c-012cc934484b`. **NOTE: `staff` share is missing** on FILES-D1 — separate issue. |
|
||||
| SAGE-SQL | 192.168.0.153 | Sage ERP (S:), RDS Session Host/Connection Broker/Web Access | Windows Server 2016 | RDS licensing grace period was expired (reset 2026-05-06). TSGateway disabled (server not externally exposed). New self-signed RDS cert installed. Bitdefender GravityZone managed AV. Share: `C:\sage`. GuruRMM agent `120ba7bf-8544-48a0-98a1-40ed5cdd3e1f`. |
|
||||
| 3CX | 192.168.0.125 | Phone system (possibly inactive) | — | Last logon Oct 2025. Production phones live on VLAN 100 under the Sangoma/FreePBX PBX — 3CX role likely superseded. |
|
||||
| DF-HYPERV-B | 192.168.0.123 | Hyper-V hypervisor | Windows Server 2025 | GuruRMM enrolled. Newest server in environment. VM inventory not captured. |
|
||||
| DF-SVR-D2-Sync | — | (role TBD) | — | GuruRMM enrolled |
|
||||
| eng-dev-server | — | Engineering dev server | — | GuruRMM enrolled |
|
||||
| D2TESTNAS | 192.168.0.9 | SMB1 bridge for DOS test stations + AOI XP backup; Neptune Exchange physically colocated | Debian 13 (trixie), Samba 4.22.6 | **Repurposed Netgear ReadyNAS** (earlier "CachyOS"/"Netgear ReadyNAS" records were stale). SMB1 enabled globally (CORE..SMB3, NTLMv1) — required for DOS 6.22 stations. rsync daemon on port 873 (module `test`, user `rsync`, hosts allow 192.168.0.0/24 + 172.16.0.0/12). SSH: `root@192.168.0.9`. Tailscale route for 172.16.0.0/22. **Shares:** `test`/`datasheets`/`snapshots` (guest; now `hosts deny 192.168.1.175`), `aoibackup` (XP-only — see Access). |
|
||||
| ENG-DEV-SERVER | 192.168.0.126 | Engineering dev server | Windows 11 Pro | GuruRMM enrolled |
|
||||
| D2TESTNAS | 192.168.0.9 | SMB1 bridge for DOS test stations + AOI XP backup; Neptune Exchange colocation routing | Debian 13 (trixie), Samba 4.22.6 | **Repurposed Netgear ReadyNAS.** SMB1 enabled globally (CORE..SMB3, NTLMv1) — required for DOS 6.22 stations. rsync daemon on port 873 (module `test`, user `rsync`, hosts allow 192.168.0.0/24 + 172.16.0.0/12). SSH: `root@192.168.0.9`. Tailscale route for 172.16.0.0/22. **Shares:** `test`/`datasheets`/`snapshots` (guest; `hosts deny 192.168.1.175`), `aoibackup` (XP-only — see Access). Acts as jump host for UDM SSH (D2TESTNAS direct-tcpip channel to 192.168.0.254). |
|
||||
| ESXi hosts | 192.168.0.122, 192.168.0.124 | VMware ESXi hypervisors | ESXi | — |
|
||||
| UDM Firewall | 192.168.0.254 | Perimeter firewall/router | UniFi OS | MAC d0:21:f9:6c:11:02. Also responds on 192.168.0.1. SSH key: `~/.ssh/id_ed25519_udm`. C2 IPs blocked via iptables (NOT permanent — need to add to UniFi UI). |
|
||||
| PBX (3CX/Sangoma) | 192.168.100.2 (also .196) | VoIP PBX — production phones on 192.168.100.0/24 | — | TFTP provisioning for Cisco SPA502G phones. Access via SSH: `sangoma@192.168.100.2`. Vault: `clients/dataforth/pbx.sops.yaml` |
|
||||
| UDM Firewall | 192.168.0.254 | Perimeter firewall/router | UniFi OS 5.1.15 | MAC d0:21:f9:6c:11:02. Also responds on 192.168.0.1. SSH: `azcomputerguru@192.168.0.254`, root SSH key added 2026-06-08, 2FA push required. Vault: `clients/dataforth/udm.sops.yaml`. C2 IPs blocked via iptables (NOT permanent — need to add to UniFi UI). Boot scripts in `/data/on_boot.d/`: `10-neptune-snat.sh` (Neptune outbound SNAT), `30-freepbx-sip-forward.sh` (SIP DNAT, WAN UDP 5060 source-locked to 66.7.123.0/24 → 192.168.100.2; SIP-only — do NOT add RTP forward). |
|
||||
| PBX (Sangoma FreePBX) | 192.168.100.2 | VoIP PBX — production phones on 192.168.100.0/24 | Sangoma FreePBX 17 / Asterisk 22.5.2 | FirstDigital PJSIP trunk; SBC 66.7.123.215:5060 (Sonus), match 66.7.123.0/24; IP-auth (no registration). `qualify_frequency=0` (FD SBC ignores OPTIONS — do NOT revert). TFTP provisioning for Cisco SPA502G phones. SSH: `sangoma@192.168.100.2`. Vault: `clients/dataforth/pbx.sops.yaml`. [WARNING] Re-apply `PJSip.class.php` line-504 patch after any `fwconsole ma updateall`. |
|
||||
|
||||
**Neptune Exchange (ACG infrastructure, physically at Dataforth D2):**
|
||||
- `neptune.acghosting.com` | internal `172.16.3.11` | external inbound `67.206.163.124` / outbound `67.206.163.122`
|
||||
@@ -142,11 +171,14 @@ Signal conditioning / data acquisition manufacturer in Tucson, AZ. Long-standing
|
||||
- **M365 licenses:** 50x Business Premium (39 used), 19x Exchange Online Plan 1 (5 used), 5x SPB (4 used)
|
||||
- **SMTP settings:** smtp.office365.com, port 587, STARTTLS — use `sysadmin@dataforth.com`
|
||||
- **SMTP AUTH status:** Tenant-level not disabled; per-mailbox varies. `calibration@dataforth.com` had SmtpClientAuthentication=true re-enabled 2026-04-23. `sysadmin@dataforth.com` SMTP AUTH is blocked by Exchange Online default — testdatadb uses Graph API for email (Mail.Send permission granted to Claude-Code-M365 app 2026-05-12).
|
||||
- **Mail security stack (layered):**
|
||||
1. **INKY PhishFence** — active transport rule `B859327F-3FBD-4BE7-A47A-97D02F1558A7` fires first (StopProcessingRules=true). Use inbox rules for per-user mail routing, NOT transport rules.
|
||||
2. **Mailprotector CloudFilter** — outbound delivery gateway (`dataforth-com.outbound.emailservice.io`, 52.3.213.180). Active outbound connector "Outbound-Mailprotector" (recipientDomains `*`). Mail may be held here. If a message shows "Delivered" in Dataforth outbound trace but never arrives, check Mailprotector (/mailprotector skill). Discovered 2026-06-05 when ghaubner email was held by "INKY - Annotation - Recipient Not Group Member" transport rule.
|
||||
- **DKIM:** Both selector1 and selector2 published. Rotated 2026-05-12; cutover to selector2 on 2026-05-16.
|
||||
- `selector1._domainkey.dataforth.com` → selector1-dataforth-com._domainkey.dataforthcom.onmicrosoft.com
|
||||
- `selector2._domainkey.dataforth.com` → selector2-dataforth-com._domainkey.dataforthcom.onmicrosoft.com
|
||||
- **DNS Host:** ntirety.com — Dataforth's public DNS zone managed through ntirety's portal (not a standard registrar). DNS change requests go to ntirety, not a domain control panel. Joel Lohr's account retained to receive ntirety.com infrastructure notifications (inbox rule → mike@azcomputerguru.com).
|
||||
- **INKY PhishFence:** Active transport rule `B859327F-3FBD-4BE7-A47A-97D02F1558A7` fires first and calls StopProcessingRules=true — blocks all subsequent custom transport rules. Use inbox rules for per-user mail routing.
|
||||
- **AutoForwarding blocked by default** (tenant outbound spam policy). If per-user forwarding needed, create scoped HostedOutboundSpamFilterPolicy for that sender with AutoForwardingMode=On.
|
||||
- **MFA:** 3 Conditional Access policies created 2026-03-27 (initially report-only; enforced 2026-04-04):
|
||||
- "ACG - Require MFA for All Users" — skip from office IP 67.206.163.122
|
||||
- "ACG - Block Foreign Sign-Ins" — US-only; MFA-Travel-Bypass group for exceptions
|
||||
@@ -159,17 +191,17 @@ Signal conditioning / data acquisition manufacturer in Tucson, AZ. Long-standing
|
||||
|
||||
- **Domain:** intranet.dataforth.com | Forest/Domain Level: Windows Server 2016
|
||||
- **ISP:** fdtnet.net | Public IP: 67.206.163.122 (outbound), 67.206.163.124 (Neptune inbound)
|
||||
- **Firewall/Router:** UniFi Dream Machine at 192.168.0.254 (also 192.168.0.1)
|
||||
- **Firewall/Router:** UniFi Dream Machine Pro at 192.168.0.254 (also 192.168.0.1), UniFi OS 5.1.15
|
||||
- **Network:** Flat (no VLANs on main LAN — 192.168.0.0/24). Voice/PBX VLAN: 192.168.100.0/24 — production phones live here. **VLAN 2 "mydata" (192.168.1.0/24)** = SMT production-line network (gateway 192.168.1.1); members on the *D2-SMT Switch* (USW Enterprise 8) + *D2-Breakroom* port 12. Supersedes the earlier note that 192.168.1.0/24 was an unused UDM default voice VLAN — it is in active use by SMT. Inter-VLAN routing from mydata → main LAN is currently OPEN.
|
||||
- **mydata members (2026-06-01):** WinXPBE-724667 (AOI XP, .175), goldstar19, DESKTOP-FT0T4MK, My9-PC, + 3 unnamed industrial/SMT devices (MAC 00:90:fb:80:f0:c6, 00:80:79:05:23:f2, 00:80:79:04:47:e7).
|
||||
- **VPN:** FortiClient required for remote access to 192.168.0.x. VPN can drop mid-session — save work frequently.
|
||||
- **VPN:** OpenVPN for ACG remote access. Client subnet 192.168.6.x (GURU-5070 gets 192.168.6.2). [WARNING] GURU-5070 OpenVPN adapter "Local Area Connection" (ifIndex 12) MTU must be set to 1400 — default 1500 causes PMTU blackhole (tunnel path MTU ~1424; bulk SSH/SCP silently drops). Verify/re-apply: `Set-NetIPInterface -InterfaceIndex 12 -AddressFamily IPv4 -NlMtuBytes 1400`. Permanent fix: add `mssfix 1360` server-side on the Dataforth OpenVPN server.
|
||||
- **Drive mappings (GPO):** B: (\\ad1\itsvc), Q: (\\ad2\c-drive), S: (\\SAGE-SQL\sage), T: (\\ad2\e-drive), W: (\\files-d1\sales), X: (\\ad2\webshare), Y: (\\files-d1\archive). DOS test stations: T: (\\D2TESTNAS\test), X: (\\D2TESTNAS\datasheets)
|
||||
|
||||
### GuruRMM Enrollment
|
||||
|
||||
- **Site name:** Dataforth D1 | Site ID: `3a2f6866-26cd-452c-9806-a8df21475c3c`
|
||||
- **Site API key:** vault `clients/dataforth/...` [check vault for current entry]
|
||||
- **Fleet size:** 45 agents total (40 online) as of 2026-06-04 — grew from 13 enrolled agents
|
||||
- **Fleet size:** 45 agents enrolled as of 2026-06-04; Syncro managed count 50 as of 2026-06-19
|
||||
- **[WARNING] GuruRMM enrollment workaround:** WebSocket auth in `ws/mod.rs` does not validate `enrolled_agents.agent_key_hash`. New agent installs must overwrite registry AgentKey with the site API key (not the enrollment AgentKey) and restart service. See Gitea issue #8.
|
||||
|
||||
**Known enrolled agents:**
|
||||
@@ -184,33 +216,34 @@ Signal conditioning / data acquisition manufacturer in Tucson, AZ. Long-standing
|
||||
| SAGE-SQL | `120ba7bf-8544-48a0-98a1-40ed5cdd3e1f` | Enrolled 2026-06-04 |
|
||||
| DF-HYPERV-B | (see RMM dashboard) | Enrolled 2026-06-04 |
|
||||
| DF-SVR-D2-Sync | (see RMM dashboard) | Enrolled 2026-06-04 |
|
||||
| eng-dev-server | (see RMM dashboard) | Enrolled 2026-06-04 |
|
||||
| ENG-DEV-SERVER | (see RMM dashboard) | Enrolled 2026-06-04 |
|
||||
| (37 additional agents) | — | Mix of workstations; full list in GuruRMM dashboard |
|
||||
|
||||
### Backup Architecture
|
||||
|
||||
- **MSP360 ("ACG-Online Backup", `cbb.exe`):** Backup provider. Storage account: `ACG-Dataforth` (account ID `0b49ca5e-...`).
|
||||
- **AD2:** Two plans — `AD2 Image` (image plan, bunch `35a5c3d2`, running daily), `Files` plan (180-day retention, NBF, daily 2 AM, covers `C:\Shares` tree; GFS off, synthetic full, compression, fast-NTFS). No shadow copies on AD2.
|
||||
- **AD1:** Only `Image2025` image plan. **Files plan PENDING** — command prepared (`addBackupPlan -n "Files" -a "ACG-Dataforth" -nbf ... -d "C:\Engineering" -d "C:\Shares\ITSvc" ... -purge "180d"`); awaiting Mike's "run AD1" signal.
|
||||
- **AD1:** `Image2025` image plan + **Files plan created 2026-06-05** (NBF, daily 2 AM, 180-day retention, `ACG-Dataforth`, covers `C:\Engineering` + `C:\Shares\ITSvc`; initial run at 2:00 AM, not manually triggered). Both image and file plans now in place, matching AD2.
|
||||
- **Pre-attack backup (offline, not MSP360):** HGHAUBNER `D:` drive holds a full pre-attack snapshot of all 7 mapped DF shares, captured before the 2025 ransomware event. This is the only recovery source predating the attack. Accessible via GuruRMM `user_session` on HGHAUBNER. Cross-machine writes use existing GPO-mapped drives only (fresh UNC blocked by WTS-impersonation — see Patterns).
|
||||
- **Historical file-level backup:** NBF bunch `faad5a67` ("Backup plan on 8/29/2025") in `ACG-Dataforth` storage contains restore points 8/29–9/29/2025, archived at old physical path `D:\c-drive\...` (pre-migration layout). Used successfully 2026-06-04 to confirm SP1366 file contents (HGHAUBNER backup chosen for actual restore — no B2 egress).
|
||||
- **WizTree backup CSV (2026-06-04):** Full-drive WizTree export of HGHAUBNER's `D:` stored at AD2 `C:\ClaudeTools\clients\dataforth\WizTree_20260604184904.zip` (sensitive — kept OFF shares). ~8.7M files / 5.7 TB across 7 shares documented. Working copy also at GURU-5070 `C:\Users\guru\AppData\Local\Temp\wiztree.zip` (delete after diff).
|
||||
- **Historical file-level backup:** NBF bunch `faad5a67` ("Backup plan on 8/29/2025") in `ACG-Dataforth` storage contains restore points 8/29–9/29/2025, archived at old physical path `D:\c-drive\...` (pre-migration layout). Used successfully 2026-06-04 to confirm SP1366 file contents.
|
||||
- **WizTree backup CSV (2026-06-04):** Full-drive WizTree export of HGHAUBNER's `D:` stored at AD2 `C:\ClaudeTools\clients\dataforth\WizTree_20260604184904.zip` (sensitive — kept OFF shares). ~8.7M files / 5.7 TB across 7 shares documented.
|
||||
|
||||
### Key Applications
|
||||
|
||||
| Application | Host | URL/Port | Notes |
|
||||
|---|---|---|---|
|
||||
| TestDataDB | AD2 | http://192.168.0.6:3000 | Node.js + Express, PostgreSQL 18, 469K records. Internal LAN only. |
|
||||
| TestDataDB | AD2 | http://192.168.0.6:3000 | Node.js + Express, PostgreSQL 18, 469K records. Internal LAN only. Redesigned UI deployed 2026-06-18 (cert-fit, publish chips, push toasts, full-screen results). |
|
||||
| Sage ERP | SAGE-SQL | \\SAGE-SQL\sage (S:) | RDS-served RemoteApp |
|
||||
| GageTrak | DF-GAGETRAK (192.168.0.102) | — | Calibration tracking. Sends email via calibration@dataforth.com (SMTP). GuruRMM enrolled. |
|
||||
| Dataforth Product API | Hoffman's servers | https://www.dataforth.com/api/v1/TestReportDataFiles | OAuth2 client_credentials. Vault: `clients/dataforth/api-oauth.sops.yaml` |
|
||||
| Dataforth Product API | Hoffman's servers | https://www.dataforth.com/api/v1/TestReportDataFiles | OAuth2 client_credentials. Vault: `clients/dataforth/api-oauth.sops.yaml`. Used actively to recover DSCA33/45 and 8B/5B/SCM spec templates. |
|
||||
| QuickBASIC 4.5 ATE | 64 DOS stations | T:\ (\\D2TESTNAS\test) | Automated test equipment programs. 1,470+ product model specs. |
|
||||
| Power Monitor SPA | Georg's dev / TBD | — | Vanilla-JS SPA for Dataforth power meters (built by Georg/Antigravity AI). Demo at PWM.dataforth.com proposed; gateway architecture designed. Parked pending Mike↔Georg conversation. `clients/dataforth/power-monitor-demo/` |
|
||||
|
||||
---
|
||||
|
||||
## Syncro Asset Inventory (2026-06-02 Reconciliation)
|
||||
|
||||
Pulled full Syncro asset list for customer_id `578095`: **78 assets** across 2 pages.
|
||||
Pulled full Syncro asset list for customer_id `578095`: **78 assets** across 2 pages. Syncro currently shows 50 managed assets (2026-06-19 live data); reconciliation/cleanup ongoing.
|
||||
|
||||
### Reconciliation Result
|
||||
|
||||
@@ -241,7 +274,7 @@ Syncro asset IDs: 23845, 149614, 9708445, 9357407, 9276901, 9212922, 9078651, 88
|
||||
|
||||
### Root Cause — Fleet-wide Syncro Agent Break ~2025-10-06
|
||||
|
||||
57 of 78 assets show `updated_at` frozen at or before 2025-10-06, while the remaining 21 show recent check-ins. This is a hard cutoff, not gradual attrition — indicating a fleet-wide Syncro agent failure around that date. The machines stayed online (visible in ScreenConnect); only the Syncro agent stopped reporting. Root cause not yet investigated. Flag for Dan Center / Winter when replying.
|
||||
57 of 78 assets show `updated_at` frozen at or before 2025-10-06, while the remaining 21 show recent check-ins. This is a hard cutoff, not gradual attrition — indicating a fleet-wide Syncro agent failure around that date. The machines stayed online (visible in ScreenConnect); only the Syncro agent stopped reporting. Root cause not yet investigated.
|
||||
|
||||
### Pending Actions (Coord todo tree, parent `103c48ad-7b31-4967-9388-065a91888e7c`, assigned to Howard)
|
||||
|
||||
@@ -276,11 +309,11 @@ Syncro asset IDs: 23845, 149614, 9708445, 9357407, 9276901, 9212922, 9078651, 88
|
||||
## Access
|
||||
|
||||
### Domain / Server Access
|
||||
- **AD2 SSH:** `ssh sysadmin@192.168.0.6` (port 22) — vault: `clients/dataforth/ad2.sops.yaml` → `credentials.password` — NOTE: stale backslash escape in vault entry; strip with `sed 's/\\//g'`
|
||||
- **AD2 SSH:** `ssh sysadmin@192.168.0.6` (port 22) — vault: `clients/dataforth/ad2.sops.yaml` → `credentials.password` — NOTE: stale backslash escape in vault entry; strip with `sed 's/\\//g'`. MTU-sensitive: GURU-5070 OpenVPN adapter ifIndex 12 must be MTU 1400 for reliable bulk transfers.
|
||||
- **AD1 SSH:** `ssh sysadmin@192.168.0.27` — vault: `clients/dataforth/ad1.sops.yaml`
|
||||
- **D2TESTNAS SSH:** `ssh root@192.168.0.9` — vault: `clients/dataforth/d2testnas.sops.yaml`. Use root, NOT sysadmin (sysadmin SSH fails on D2TESTNAS). SSH key from acg-guru-5070 authorized. (Password auth works for root; UDM does NOT — UDM is publickey/keyboard-interactive only, 2FA push, key `id_ed25519_udm`.)
|
||||
- **D2TESTNAS `aoibackup` share (AOI XP backup):** `\\192.168.0.9\aoibackup` — Samba user `admin` (password matches the XP's local login), `hosts allow = 192.168.1.175` only, `browseable = no`. Other NAS shares (`test`/`datasheets`/`snapshots`) explicitly deny 192.168.1.175. Creds in vault: `clients/dataforth/d2testnas.sops.yaml → credentials.smb.aoi-user` / `.aoi-password` / `.aoi-share`.
|
||||
- **UDM SSH:** `ssh root@192.168.0.254` — SSH key `~/.ssh/id_ed25519_udm` (generated 2026-03-27)
|
||||
- **D2TESTNAS SSH:** `ssh root@192.168.0.9` — vault: `clients/dataforth/d2testnas.sops.yaml`. Use root, NOT sysadmin (sysadmin SSH fails on D2TESTNAS). SSH key from acg-guru-5070 authorized.
|
||||
- **D2TESTNAS `aoibackup` share (AOI XP backup):** `\\192.168.0.9\aoibackup` — Samba user `admin` (password matches the XP's local login), `hosts allow = 192.168.1.175` only, `browseable = no`. Other NAS shares explicitly deny 192.168.1.175. Creds in vault: `clients/dataforth/d2testnas.sops.yaml → credentials.smb.aoi-user` / `.aoi-password` / `.aoi-share`.
|
||||
- **UDM SSH:** `ssh azcomputerguru@192.168.0.254` (2FA push) or `ssh root@192.168.0.254` (root SSH key installed 2026-06-08). Jump via D2TESTNAS: paramiko `direct-tcpip` channel or ProxyJump. Vault: `clients/dataforth/udm.sops.yaml` (corrected 2026-06-09).
|
||||
- **SAGE-SQL SSH:** `ssh sysadmin@192.168.0.153` — SSH key (`C:\ProgramData\ssh\administrators_authorized_keys` on SAGE-SQL)
|
||||
- **All server passwords:** vault (individual vault entries per server — `clients/dataforth/<host>.sops.yaml`)
|
||||
- **WinRM (AD2/AD1):** port 5985 — pywinrm with NTLM, user `INTRANET\sysadmin`
|
||||
@@ -304,6 +337,7 @@ Syncro asset IDs: 23845, 149614, 9708445, 9357407, 9276901, 9212922, 9078651, 88
|
||||
- Grant: `client_credentials`, Client ID: `dataforth.onprem.sync`, Scope: `dataforth.web`
|
||||
- Token TTL: 1 hour
|
||||
- Swagger: `https://www.dataforth.com/swagger/index.html`
|
||||
- Endpoints: `GET /api/v1/TestReportDataFiles/{serial}` (per-model cert), `/bulk`, `/stats`
|
||||
|
||||
### ESXi / Hypervisors
|
||||
- ESXi-122: 192.168.0.122 — vault: `clients/dataforth/esxi-122.sops.yaml`
|
||||
@@ -311,6 +345,7 @@ Syncro asset IDs: 23845, 149614, 9708445, 9357407, 9276901, 9212922, 9078651, 88
|
||||
|
||||
### PBX
|
||||
- Vault: `clients/dataforth/pbx.sops.yaml`
|
||||
- SSH: `sangoma@192.168.100.2`
|
||||
|
||||
---
|
||||
|
||||
@@ -330,13 +365,18 @@ Syncro asset IDs: 23845, 149614, 9708445, 9357407, 9276901, 9212922, 9078651, 88
|
||||
- **GPO cert distribution:** Not completed (AD2 SYSVOL write blocked from non-domain workstation). Pending.
|
||||
- **Bitdefender GravityZone:** Managed AV on SAGE-SQL. Can block PowerShell execution — may need temporary disable for admin work.
|
||||
|
||||
### Voice / Phones
|
||||
### Voice / Phones / FreePBX
|
||||
- **Production phones VLAN:** 192.168.100.0/24. PBX at .196 / .2. All production phones live here.
|
||||
- **Unifi default voice VLAN (192.168.1.0/24):** NOT used for production — phones landing here cannot reach PBX. Switch port misconfiguration symptom: phone shows wrong date/time (NTP failure) and no dial tone.
|
||||
- **D1-Server-Room port 1:** Controls lobby drop → must stay on VLAN 100. Reverted to default once before (2026-05-04 incident).
|
||||
- **FirstDigital trunk — `qualify_frequency=0`:** FD's Sonus SBC ignores SIP OPTIONS keepalives. Setting `qualify=0` in the `pjsip` DB (id=1) prevents trunk from going Unavailable. **Do NOT revert to a non-zero qualify.** (Total phone outage 2026-06-08 was caused by FD SBC not answering OPTIONS, making trunk go Unavailable and blocking all INVITEs.)
|
||||
- **PJSip.class.php line 504 patch must be re-applied** after any `fwconsole ma updateall`. It is wiped by FreePBX updates. Backup before each update (`PJSip.class.php.bak.<timestamp>`).
|
||||
- **Do NOT port-forward the RTP range (10000-20000)** on the UDM for this trunk. A static RTP DNAT creates a conntrack collision with the PBX's outbound RTP — inbound works but outbound audio dies. SIP 5060 forward only (source-locked to 66.7.123.0/24). Current on_boot.d script (`30-freepbx-sip-forward.sh`) is SIP-only, correct.
|
||||
- **Inbound SIP relies on `/data/on_boot.d/30-freepbx-sip-forward.sh`** — not a persistent UniFi UI rule. Must survive UDM reboot via the script. Recommend Mike add a UI port-forward as a belt-and-suspenders measure.
|
||||
|
||||
### Exchange Online / Email
|
||||
- **INKY PhishFence StopProcessingRules:** Kills all subsequent transport rules. Use inbox rules for per-mailbox forwarding, NOT transport rules.
|
||||
- **Mailprotector CloudFilter:** Outbound delivery goes through Mailprotector. If a message is "Delivered" per Dataforth's outbound trace but never arrives, check Mailprotector (`/mailprotector skill`, `py mp.py messages ...`) — it may be held. The INKY "Annotation - Recipient Not Group Member" transport rule can route mail to Mailprotector's hold queue.
|
||||
- **AutoForwarding blocked by default** (tenant outbound spam policy). If per-user forwarding needed, create scoped HostedOutboundSpamFilterPolicy for that sender with AutoForwardingMode=On.
|
||||
- **Get-MessageTrace deprecated Sept 2025:** Use Get-MessageTraceV2 and Get-MessageTraceDetailV2 in Exchange PowerShell.
|
||||
|
||||
@@ -349,12 +389,28 @@ Syncro asset IDs: 23845, 149614, 9708445, 9357407, 9276901, 9212922, 9078651, 88
|
||||
- **Workaround that works:** Run on the SOURCE machine in `user_session` and write to an **existing GPO-mapped drive** (e.g. Q: → `\\ad2\c-drive`). The existing mapping survives impersonation; fresh UNC does not.
|
||||
- **Proven 2026-06-04 on HGHAUBNER:** local `D:\DF C-Drive` read + `Q:` write succeeded; AD2-side `user_session` copy and SSH-from-AD2 both failed.
|
||||
|
||||
### AD2 SSH / VPN MTU
|
||||
- **PMTU blackhole on GURU-5070 → AD2 SSH:** GURU-5070's OpenVPN adapter "Local Area Connection" (ifIndex 12, IP 192.168.6.2) defaults to MTU 1500. Tunnel path MTU is ~1424 (FD ping confirms). Over-MTU bulk TCP segments (SSH transfers, SCP) are silently dropped. Small interactive commands pass, creating a false appearance of "flaky VPN" or "SSH ban."
|
||||
- **Fix (applied 2026-06-18):** `Set-NetIPInterface -InterfaceIndex 12 -AddressFamily IPv4 -NlMtuBytes 1400` on GURU-5070 via SYSTEM RMM agent. Registry-persistent but may reset on OpenVPN reconnect — verify with `Get-NetIPInterface -InterfaceIndex 12`.
|
||||
- **Durable fix:** server-side `mssfix 1360` on the Dataforth OpenVPN server (or `push "tun-mtu 1400"`) — would auto-clamp all fleet clients, not just GURU-5070.
|
||||
- **AD2 is NOT the target for SSH diagnosis** when SSH is the failing channel — use RMM instead.
|
||||
|
||||
### AD2 Branch / Coordination
|
||||
- **AD2 operates on the `ad2` git branch.** Fork is rebased from main + thin Dataforth-specific commits. Do NOT edit shared fleet files on `ad2` — conflicts on every sync. Dataforth context lives in `clients/dataforth/CLAUDE.dataforth.md`.
|
||||
- **AD2 is coord-API isolated:** 172.16.3.30 is unreachable from Dataforth LAN. Coord messages, locks, and todos NEVER reach AD2. All inter-session coordination goes through git sync: committed handoff docs + `## Note for <user>` blocks. Do NOT use the coord skill for AD2.
|
||||
- **sync.sh on AD2:** not fork-aware on the push step (always tries `main`); force-push manually: `git push --force-with-lease origin ad2` after rebasing.
|
||||
|
||||
### Post-Ransomware Recovery Restore (2025) — Incomplete File Migration
|
||||
- **The 10/1/2025 recovery restore was incomplete.** The `Restore plan 10/1/2025` (~3.4M files) migrated each share from the old `D:\<share>` layout to the current `C:\Shares\...` layout on AD2 and dropped files in the process. Proven case: SP1366 MAQ20 Communications Module — each `PRINTOUTS FOR MANUFACTURING` folder for revisions E–H received only one file (the drill panel) when the backup contained ~6 files per revision. The 9/29/2025 file-level backup confirms the files existed before the restore.
|
||||
- **Scope unknown.** Other folders across the 7 shares may have similar gaps. A full migration-gap audit is underway (WizTree both sides — see Active Work). The audit is **review-only** — no automatic restore, because some deletions were intentional and the HGHAUBNER backup is additive-only (includes Georg's personal files alongside corporate data).
|
||||
- **Backup-side CSV** for diffing stored at AD2 `C:\ClaudeTools\clients\dataforth\WizTree_20260604184904.zip` (sensitive file list — keep off shares and off any publicly accessible directory).
|
||||
- **AD2 D: drive is gone.** The old `D:\c-drive` data volume was repurposed as a mounted Windows install ISO during the rebuild. All share data now lives under `C:\Shares`. The historical file-level backup (bunch `faad5a67`) archived the data under `D:\c-drive\...` (pre-migration path) — reconcile paths accordingly.
|
||||
|
||||
### Shares ACL State — All Open to All Staff
|
||||
- **All 8 business shares grant access to every employee** via `Everyone`/`Domain Users` (FullControl on 4 shares, Modify on 3). No department-based security groups exist. Sensitive data — Payroll, OSHA records, Purchase Orders, Accounting/QuickBooks, Sage financials — is fully readable and writable by all domain users.
|
||||
- **Remediation project in progress** (Shares & Permissions, started 2026-06-10). Phase 0 (discovery) complete. Phase 1 (client input/department matrix) pending email to Dan Center. Do not apply ACL changes until after client sign-off on the target model. Details: `clients/dataforth/docs/projects/shares-permissions/`.
|
||||
- **Special shares excluded from remediation:** `test` (DOS/SMB1 guest — leave open); `webshare` (preserve `svc_testdatadb:Full`); `ITSvc` (Domain Computers needs Read).
|
||||
|
||||
### Security
|
||||
- **C2 IP blocks are iptables only** — do not survive UDM reboot. Must add to permanent UniFi block list via UI. C2 IPs: 80.76.49.18, 45.88.91.99 (AS399486 Virtuo, Montreal).
|
||||
- **AD1 disk 90% full** — C:\Engineering = 787 GB of 1023 GB. Risk of replication failures.
|
||||
@@ -376,26 +432,40 @@ Syncro asset IDs: 23845, 149614, 9708445, 9357407, 9276901, 9212922, 9078651, 88
|
||||
|
||||
## Active Work
|
||||
|
||||
As of 2026-06-04:
|
||||
As of 2026-06-19 (no open Syncro tickets):
|
||||
|
||||
- **Migration-gap audit (in progress):** WizTree CSV of HGHAUBNER's pre-attack backup captured (AD2 `C:\ClaudeTools\clients\dataforth\WizTree_20260604184904.zip`). Next: WizTree runs on live servers (AD2, FILES-D1, SAGE-SQL, AD1) tomorrow (2026-06-05); diff CSV-to-CSV per share → `clients/dataforth/migration-gap-catalog-2026-06-04.md`. Full plan in `clients/dataforth/migration-gap-diff-RESUME.md`. RMM agent IDs for the 4 servers are documented there. No auto-restore — review-only catalog.
|
||||
- **Shares & Permissions project (Phase 1 — BLOCKING, pending client input):** Phase 0 (discovery) completed 2026-06-10 — read-only ACL audit confirmed all 8 business shares open to all employees; Domain Users has FullControl on 4 shares. Discovery email to Dan Center drafted (`clients/dataforth/docs/projects/shares-permissions/discovery-email-draft.md`); not yet sent. Phase 1 blocked on client responses: department list, access matrix, sensitive-data rules, staff rosters. Full roadmap: `clients/dataforth/docs/projects/shares-permissions/roadmap.md`.
|
||||
|
||||
- **AD1 Files backup (command ready, not run):** `addBackupPlan` command prepared for AD1 (NBF, daily 2 AM, 180-day retention, `ACG-Dataforth`, covers `C:\Engineering` + `C:\Shares\ITSvc`). Awaiting Mike's explicit "run AD1" approval — production DC. Full command in `clients/dataforth/migration-gap-diff-RESUME.md`.
|
||||
- **8B/5B/SCM render completion (parked with AD2):** Root-caused a `parseRawData` bug (PASS/FAIL line consumed as step-response for families that omit `"0","0",v` line). 136 8B/5B/SCM templates mined from Hoffman API (2026-06-18). Completion — wiring templates into the live renderer with correct slotmaps, QB rounding, and frequency/AAC accuracy — handed to AD2 (its now-proven machinery from DSCA33/45 work). Sync handoff at `projects/dataforth-dos/8B5BSCM-RENDER-VERIFY-2026-06-18.md`. ~9,624 records remain unpublished; this is a render-coverage gap (null renders correctly skipped), not a backlog.
|
||||
|
||||
- **SP1366 MAQ20 file recovery (RESOLVED 2026-06-04):** 19/20 missing manufacturing print PDFs restored for revisions E–H to AD2 `C:\Shares\c-drive\DOCUMENT\DESIGN\SP\SP1366 MAQ20 Communications Module\{E,F,G,H}\PCB1366 REV <rev> PRINTOUTS FOR MANUFACTURING`. Syncro ticket #32385 billed 1.0 hr remote (prepaid, $0), resolved + invoiced. REV F `TOP PASTE LAYER` confirmed absent from both independent backups — not restored.
|
||||
- **Migration-gap audit (parked):** WizTree CSV of HGHAUBNER's pre-attack backup captured (AD2 `C:\ClaudeTools\clients\dataforth\WizTree_20260604184904.zip`). WizTree runs on live servers deferred — no diff yet. Plan: run WizTree on AD2, FILES-D1, SAGE-SQL, AD1 → diff CSV-to-CSV per share → `clients/dataforth/migration-gap-catalog-2026-06-04.md`. Full plan in `clients/dataforth/migration-gap-diff-RESUME.md`. No auto-restore — review-only catalog.
|
||||
|
||||
- **Syncro asset cleanup (2026-06-02):** 78-asset reconciliation complete. 28 confirmed-dead assets pending GUI deletion; 21 alive-but-broken machines need Syncro agent reinstall; 9 servers in VERIFY bucket. Move to metered billing once clean. Reply to Winter pending. Coord todo tree assigned to Howard (parent `103c48ad-7b31-4967-9388-065a91888e7c`). See [Syncro Asset Inventory](#syncro-asset-inventory-2026-06-02-reconciliation) above.
|
||||
- **Syncro asset cleanup (with Howard):** 78-asset reconciliation complete. 28 confirmed-dead assets pending GUI deletion; 21 alive-but-broken machines need Syncro agent reinstall; 9 servers in VERIFY bucket. Move to metered billing once clean. Coord todo tree assigned to Howard (parent `103c48ad-7b31-4967-9388-065a91888e7c`). See [Syncro Asset Inventory](#syncro-asset-inventory-2026-06-02-reconciliation) above.
|
||||
|
||||
- **AOI XP backup + isolation (2026-06-01):** AOI optical-inspection XP PC moved to VLAN 2 (mydata/SMT) @ 192.168.1.175; locked-down SMB1 share `aoibackup` on D2TESTNAS (XP-only, user `admin`). Other NAS shares now deny the XP. Mike OK'd full SMT visibility ("it's part of SMT"). **Optional EOL hardening pending:** block XP → company LAN (except NAS 192.168.0.9) + Internet on the UDM, scoped to .175 (won't affect other SMT devices). Todo `37543f7f`.
|
||||
- **AOI XP backup + isolation (ongoing):** AOI optical-inspection XP PC on VLAN 2 (mydata/SMT) @ 192.168.1.175; locked-down SMB1 share `aoibackup` on D2TESTNAS (XP-only, user `admin`). Other NAS shares now deny the XP. **Optional EOL hardening pending:** block XP → company LAN (except NAS 192.168.0.9) + Internet on the UDM, scoped to .175. Todo `37543f7f`.
|
||||
|
||||
- **AD2 Claude capability updates (parked):** AD2 runs its own Claude from `C:\ClaudeTools`. Needs: (a) syncro + coord commands, (b) DF wiki read-write, (c) Dataforth client data access. Determine if remote is shared Gitea (git pull sufficient) or diverged clone. See resume doc.
|
||||
- **AD2 Claude capability updates (parked):** AD2 runs its own Claude from `C:\ClaudeTools` on the `ad2` branch. Needs: (a) syncro + coord commands, (b) DF wiki read-write, (c) Dataforth client data access. Python 3.12.8 and identity.json installed 2026-06-17. Coord API unreachable from Dataforth LAN — comms via git sync only.
|
||||
|
||||
- **Power Monitor SPA demo (parked):** Georg Haubner developed a vanilla-JS power-meter SPA (AI-built, `clients/dataforth/ExternalCodeReview.zip`). ACG designed a gateway architecture for a gated demo at `PWM.dataforth.com` (inbound tunnel, no meter publicly exposed, magic-link auth). Spec at `clients/dataforth/power-monitor-demo/GATEWAY-SPEC.md`. Parked pending Mike↔Georg conversation.
|
||||
|
||||
- **Test Datasheet Pipeline:**
|
||||
- Production pipeline healthy. 469K records, DSCA33/45 recovery complete (1,452 new certs published 2026-06-18 via Hoffman API). Daily task runs 02:30 AM.
|
||||
- Email notifications deployed (Graph API via `sysadmin@dataforth.com`).
|
||||
- 8B/5B/SCM render gap — parked with AD2 (see above).
|
||||
- 2 niche DSCA models (DSCA33-1948, DSCA45-1746) and their 8B equivalents have no Hoffman original — no template, cannot auto-publish.
|
||||
- DKIM: cutover to selector2 on 2026-05-16 — no action needed; verify signing after that date.
|
||||
|
||||
- **GAGEtrak email (ticket #32142):** calibration@ SMTP re-enabled 2026-04-23. GAGEtrak configured (smtp.office365.com:587, calibration@dataforth.com). Kevin Wackerly verifying schedule — expected Monday run appears to run Tuesday.
|
||||
|
||||
- **Test Datasheet Pipeline:** Production pipeline healthy. 469K records, 458.5K live on website. Daily task runs 02:30 AM. Email notification deployed but pending SMTP AUTH fix — sysadmin SMTP AUTH disabled in Exchange Online. See `projects/dataforth-dos/CONTEXT.md`.
|
||||
- **GAGEtrak email (ticket #32142):** calibration@ SMTP re-enabled 2026-04-23. GAGEtrak configured (smtp.office365.com:587, calibration@dataforth.com). Kevin Wackerly verifying schedule on DF-GAGETRAK — expected Monday run appears to run Tuesday.
|
||||
- **DKIM rotation:** Automatic cutover to selector2 on 2026-05-16 — no action needed; verify signing after that date.
|
||||
- **jlohr forwarding:** ntirety.com inbox rule active as of 2026-05-12; confirmed delivering to mike@azcomputerguru.com. Defunct transport rule pending cleanup.
|
||||
|
||||
- **RDS / SAGE-SQL:** RDS grace period reset. GPO cert distribution pending. RDS CALs purchase needed long-term.
|
||||
- **MFA enforcement ongoing** — 19 users were still not enrolled as of April 4 enforcement date; current count unverified.
|
||||
|
||||
- **MFA enforcement ongoing** — 19 users were not enrolled as of April 4 enforcement date; current enrollment count unverified.
|
||||
|
||||
- **C2 IP blocks need permanence:** Iptables rules on UDM (80.76.49.18, 45.88.91.99) need to be added to permanent UniFi UI block list.
|
||||
|
||||
- **UDM inbound SIP port-forward:** Recommended to add matching rule in UniFi UI (current on_boot.d script covers reboots; UI rule is belt-and-suspenders).
|
||||
|
||||
---
|
||||
|
||||
@@ -424,10 +494,17 @@ As of 2026-06-04:
|
||||
| 2026-05-04 | Howard onsite — lobby phone offline (VLAN misconfiguration on D1-Server-Room port 1 → fixed to VLAN 100). |
|
||||
| 2026-05-06 | SAGE-SQL RDS issues resolved — grace period reset, SSL cert replaced, TSGateway disabled, RemoteApp permission prompts fixed. |
|
||||
| 2026-05-12 | Pipeline audit + email notifications implemented (Graph API). jlohr forwarding configured (ntirety.com → mike@). DKIM keys rotated. |
|
||||
| 2026-06-01 | AOI optical-inspection XP PC isolated onto VLAN 2 (mydata/SMT) @ 192.168.1.175; `aoibackup` SMB1 share created on D2TESTNAS locked to the XP only; other NAS shares set to deny the XP. D2TESTNAS confirmed Debian 13 / Samba 4.22.6 (repurposed Netgear ReadyNAS); vault + wiki OS corrected. Mike: AOI may see all of SMT; optional company-LAN/Internet block for the XP still pending. |
|
||||
| 2026-06-01 | Chauncey Bell (cbell) M365 verified — active mailbox, licensed Microsoft 365 Business Standard (full Office + Exchange); AD password reset on AD2 (synced user, OU=Azure_Users), signed into Office. Bobbi's Outlook printing fixed by switching to Outlook (Classic). Ticket #32364 (0.5 hr onsite). |
|
||||
| 2026-06-02 | Syncro asset reconciliation (78 assets): 20 keep / 21 save+flag / 28 remove / 9 verify. Root cause identified: fleet-wide Syncro agent break ~2025-10-06 silenced ~half the fleet while boxes stayed online (visible in ScreenConnect). Dataforth confirmed phasing off Bitdefender (only 4 of 57 GravityZone endpoints actively managed; 53 in Deleted folder). GUI delete list and 5-step todo tree handed to Howard. Move to metered billing pending cleanup. ScreenConnect API auth pattern documented (CTRLAuthHeader raw secret + Origin). |
|
||||
| 2026-06-04 | SP1366 MAQ20 manufacturing print recovery — 19/20 PDFs for revisions E–H restored to AD2 from HGHAUBNER's pre-attack backup (D:\DF C-Drive) via GuruRMM user_session + GPO-mapped Q: drive. Root cause of loss: incomplete 10/1/2025 recovery restore. MSP360 file backup (`faad5a67`) independently cross-validated (both sources agree: 19/20 present). Syncro #32385, 1.0 hr remote, prepaid $0, resolved. GuruRMM fleet grew 13 → 45 agents (AD1, FILES-D1, SAGE-SQL, DF-HYPERV-B, DF-SVR-D2-Sync, eng-dev-server, + many workstations enrolled). WizTree backup-side CSV captured for migration-gap diff; diff deferred to 2026-06-05. AD1 Files backup command prepared (not run). |
|
||||
| 2026-06-01 | AOI optical-inspection XP PC isolated onto VLAN 2 (mydata/SMT) @ 192.168.1.175; `aoibackup` SMB1 share created on D2TESTNAS locked to the XP only; other NAS shares set to deny the XP. D2TESTNAS confirmed Debian 13 / Samba 4.22.6 (repurposed Netgear ReadyNAS); vault + wiki OS corrected. |
|
||||
| 2026-06-01 | Chauncey Bell (cbell) M365 verified — active mailbox, licensed M365 Business Standard; AD password reset on AD2 (synced user, OU=Azure_Users), signed into Office. Bobbi's Outlook printing fixed. Ticket #32364 (0.5 hr onsite). |
|
||||
| 2026-06-02 | Syncro asset reconciliation (78 assets): 20 keep / 21 save+flag / 28 remove / 9 verify. Root cause identified: fleet-wide Syncro agent break ~2025-10-06 silenced ~half the fleet while boxes stayed online (visible in ScreenConnect). Dataforth confirmed phasing off Bitdefender. Cleanup list handed to Howard. |
|
||||
| 2026-06-04 | SP1366 MAQ20 manufacturing print recovery — 19/20 PDFs for revisions E–H restored to AD2 from HGHAUBNER's pre-attack backup via GuruRMM user_session + GPO-mapped Q: drive. Root cause of loss: incomplete 10/1/2025 recovery restore. Syncro #32385, 1.0 hr remote, prepaid $0, resolved. GuruRMM fleet grew 13 → 45 agents. WizTree backup-side CSV captured for migration-gap diff (deferred). |
|
||||
| 2026-06-05 | AD1 Files backup plan created via GuruRMM remote command (cbb.exe, NBF, 180-day retention, daily 2 AM, covers C:\Engineering + C:\Shares\ITSvc). AD1 now has both image and file plans matching AD2. |
|
||||
| 2026-06-05 | **Mailprotector CloudFilter discovered** as Dataforth's outbound delivery layer (atop INKY + Exchange Online). Email from Georg Haubner was held by Mailprotector due to INKY "Annotation" transport rule. Released manually. New `/mailprotector` skill built and committed. |
|
||||
| 2026-06-05 | Georg Haubner's Power Monitor SPA analyzed (vanilla-JS, AI-built). Gateway architecture designed for PWM.dataforth.com demo. Parked pending Mike↔Georg conversation. |
|
||||
| 2026-06-08–09 | **Total Dataforth phone outage.** Outbound failed (FirstDigital SBC ignoring OPTIONS → trunk Unavailable); inbound never worked (no SIP port-forward existed). Fixed: `qualify_frequency=0` in pjsip DB; `PJSip.class.php` line 504 re-patched; `/data/on_boot.d/30-freepbx-sip-forward.sh` added (SIP-only DNAT, source-locked 66.7.123.0/24). Two-way audio verified. UDM vault corrected. Syncro #32392, 1.0 hr emergency (×1.5 rate) remote, prepaid. |
|
||||
| 2026-06-10 | **Shares & Permissions Phase 0 complete.** Read-only ACL audit of all 8 business shares: all grant Domain Users/Everyone Full or Modify; no department security groups exist; Payroll/OSHA/PO/accounting data open to all employees. Phase 1 (client input) pending discovery email to Dan Center. |
|
||||
| 2026-06-17 | AD2 identity.json + Python 3.12.8 installed. `CLAUDE.dataforth.md` created for AD2 context file (relocated from in-line `.claude/CLAUDE.md` edits to maintain clean fork). |
|
||||
| 2026-06-18 | **DSCA33/45 certs recovered via Hoffman API** — 56 model templates mined, 1,452 new DSCA33/45 certs published on AD2 (0 overwrites). Root-caused `parseRawData` bug affecting 8B/5B/SCM families. 136 8B/5B/SCM templates mined from Hoffman and handed to AD2 for wiring. TestDataDB UI redesigned and deployed on AD2 (cert-fit, publish chips, push toasts, full-screen inspector). AD2 SSH PMTU blackhole diagnosed (GURU-5070 adapter MTU 1500 vs tunnel ~1424) and fixed (MTU 1400). Syncro #32441. |
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Wiki Index
|
||||
|
||||
Last updated: 2026-06-19
|
||||
g<1>2026-06-20
|
||||
Compiled by: GURU-5070/claude-main
|
||||
|
||||
This wiki is LLM-maintained. Do not edit articles manually — run `/wiki-compile` to update.
|
||||
@@ -18,8 +18,8 @@ Run `/wiki-lint` to check for stale entries and broken backlinks.
|
||||
|
||||
| Article | Summary | Last Compiled |
|
||||
|---|---|---|
|
||||
| [Cascades of Tucson](clients/cascades-tucson.md) | Prepaid block $175/hr, **55.75 hrs remaining** (live 2026-06-18); senior living; active domain migration + HIPAA compliance project; single DC on aging R610 hardware; caregiver restricted-access model PROVEN 2026-06-05: Hybrid Entra Join + CA allow-list + ALIS SSO validated on NURSESTATION-PC/pilot.test; GPO `CSC - Caregiver Workstation` (shortcuts + printers) built + validated; GPO `CSC - Caregiver Device Lockdown` deployed (HIPAA auto-logoff, activates on reboot); INTUNE_A PendingInput tenant-wide (MS case open; GPO path used instead); folder-redirection root cause fixed 2026-06-08 (fdeploy.ini); shared mailboxes grievances@/Surveys@ created + delegated 2026-06-12 (#32417); Monday cutover to real caregivers pending; #32383 (bill.com/BOK chris.knight) Resolved; UniFi wifi RF (77 U7-Pro APs/~587 clients via UOS controller): 2.4GHz over-coverage = primary pain; pfSense ruled out as cause; Floor-4 power-down pilot applied 2026-06-16 (retry 13.2->9.5%); coverage-thin disable plan + 2.4 remediation runbook staged; DFS empirically clean; 6GHz untapped; CS-SERVER OS RAID-1 degraded 2026-06-15 (data-loss risk; cloud backup now started); Voice VLAN (VLAN 30) consolidation planned 2026-06-16 for Vertical phones + remote desktop (CSCNet confirmed a shared PPSK SSID); KPI dashboard for Ashley Jensen scoped 2026-06-17 (Power BI + SharePoint phased plan, parked); Voice VLAN 30 built + 22/22 Poly cut over 2026-06-17 (AudioCodes 0/8 pending); building power outage 2026-06-17 (pfSense on UPS surge-only side) full site down + recovered; DESKTOP-TRCIEJA (Lupe Sanchez) slow Excel diagnosed 2026-06-18 = EOL i3-2120 hardware + dual real-time AV (leftover Datto stack) -> replace machine; network-logging spec written 2026-06-18 (on-site Synology Log Center; UniFi retains 0 client events -- drop/kick history not captured); **Voice VLAN migration COMPLETE 2026-06-19** (38 devices: 29 Poly + 8 AudioCodes + desktop; awaiting Vertical to set Poly 5GHz-only). **RF optimized 2026-06-19** (2.4 power Low/full->Medium + 5GHz moved to clean DFS channels via data-driven scan -> 5GHz retry halved; 6GHz blocked by WPA3); Syncro 0 open tickets | 2026-06-19 |
|
||||
| [Dataforth Corporation](clients/dataforth.md) | Prepaid block ~$2,099/mo, 34.5 hrs remaining; signal conditioning manufacturer; 64 DOS test stations; 2025 crypto attack recovery + incomplete restore (files dropped across shares — migration-gap audit in progress); 2026-03-27 phishing incident + MFA rollout; active test datasheet pipeline project; Neptune Exchange colocated at D2; 2026-06-04 SP1366 file recovery (19/20 PDFs restored from HGHAUBNER pre-attack backup); GuruRMM fleet 13→45 agents; 2026-06-02 Syncro asset reconciliation (78→20 keep/21 flag/28 remove/9 verify); fleet-wide Syncro agent break ~2025-10-06; Bitdefender phase-off in progress | 2026-06-04 |
|
||||
| [Cascades of Tucson](clients/cascades-tucson.md) | Prepaid block $175/hr, **48.75 hrs remaining** (live 2026-06-20); senior living; active domain migration + HIPAA caregiver-lockdown project (GPOs deployed; Entra Hybrid Join + CA allow-list + ALIS SSO model proven); single DC (CS-SERVER) on aging R610, OS RAID-1 degraded 2026-06-15 (data-loss risk; cloud backup started); **Voice VLAN 30 migration COMPLETE 2026-06-19** (~38 devices: 29 Poly + 8 AudioCodes + desktop; awaiting Vertical to set Poly 5GHz-only); **UniFi RF optimized 2026-06-19** (77 U7-Pro APs/~587 clients: 2.4GHz power->Medium on 47 radios + 5GHz clean-DFS 40MHz channel plan -> 5GHz retry halved; 6GHz blocked by WPA3 on PPSK SSID); Syncro 0 open tickets | 2026-06-20 |
|
||||
| [Dataforth Corporation](clients/dataforth.md) | Prepaid block ~$2,099/mo, **31.5 hrs remaining** (live 2026-06-20); signal-conditioning manufacturer; 64 DOS test stations; 2025 ransomware recovery + incomplete file restore (migration-gap audit); 2026-03 phishing + MFA rollout; test-datasheet pipeline (DSCA cert publish via Hoffman API + testdatadb UI on AD2); mail stack INKY->Mailprotector CloudFilter->EXO; FreePBX 17 outage fixed 2026-06-08/09 (qualify_frequency=0; no RTP-forward); shares-ACL project (all open to staff); Syncro asset reconciliation 2026-06-02; GuruRMM fleet ~45; Bitdefender phase-off | 2026-06-20 |
|
||||
| [Instrumental Music Center](clients/instrumental-music-center.md) | Prepaid block $175/hr, 12.5 hrs remaining; music retail/repair; AIMsi POS on SQL Server 2019; phantom DC causing slow logons; GuruRMM enrolled (IMC1) | 2026-05-24 |
|
||||
| [Jimmy Company](clients/jimmy.md) | Break-fix, $150/hr; single aging workstation BLASTER2 (Win10 22H2 EOL, i5-3470/3.8GB — replace); backups the recurring theme (QuickBooks data); onboarded to GuruRMM 2026-06-19 (RDP NLA + Kaseya removal + cleanup); MSP360 local backup drive full, 90-day retention set, space reclaim pending in console (cloud B2 healthy) | 2026-06-19 |
|
||||
| [Valley Wide Plastering](clients/valleywide.md) | Prepaid block, 10 hrs remaining; plastering/stucco contractor; HP DL360 Gen10 + XenServer; VB6 app modernization project; RDWeb brute-force incident; 11 Yealink phones pending | 2026-06-14 |
|
||||
|
||||
Reference in New Issue
Block a user