diff --git a/session-logs/2026-06/2026-06-23-mike-22nd-st-office-network-investigation.md b/session-logs/2026-06/2026-06-23-mike-22nd-st-office-network-investigation.md new file mode 100644 index 00000000..7dc00e7a --- /dev/null +++ b/session-logs/2026-06/2026-06-23-mike-22nd-st-office-network-investigation.md @@ -0,0 +1,115 @@ +# 22nd St office "slow internet" investigation (dead PoE switch + AP, Winter fix) + Factorio mod design + +## User +- **User:** Mike Swanson (mike) +- **Machine:** GURU-5070 +- **Role:** admin + +## Session Summary + +Two threads. **(1) ACG 22nd St office network troubleshooting** — reported "internet slower than +normal," with Winter's phone showing Wi-Fi "no internet" and her wired PC getting partial page +loads. Worked it top-down. The office pfSense (172.16.0.1, SSH :2248, pfSense 2.8.1) and its single +Cox fiber WAN tested fully healthy: 0% loss, 14 ms, ~570 Mbps down / ~385 up single-stream, 1G +full-duplex WAN NIC with 0 errors, full-1500 MTU passes, DNS reliable, CPU idle, DHCP pool +(172.16.1.1-254, 2 h leases, ~28 active devices) nowhere near exhaustion. Confirmed it is a flat +/22 (172.16.0.0/22) with no VLANs — the `192.168.206.x` seen on Winter's Wi-Fi was stale junk, not +a real guest VLAN. + +Pivoted to Winter's PC (DESKTOP-U303G5J, added to GuruRMM mid-session): wired path was clean (1G +FD, 0 loss, 476 Mbps). Her **partial page loads were caused by a broken Wi-Fi adapter** — APIPA +169.254 (no DHCP) with a **stale static DNS 192.168.206.48** that bled into Windows' multi-adapter +DNS resolution and caused intermittent lookup timeouts. **Disabled the Wi-Fi adapter via RMM** +(she's wired) — DNS now clean, resolved. + +Then the office Wi-Fi via the UniFi controller (UOS 172.16.3.29; site "22nd St"). gw-audit showed +disconnected devices; ping + controller stat confirmed the **root cause: the US-8-60W PoE switch +(172.16.1.8) is DOWN**, which took the **U7-Pro AP (172.16.1.34) offline** — office running on 1 of +2 APs → Wi-Fi unusable in that AP's area (Winter's phone). Also found the **USW Pro 24 Fiber Port +24 linked at only 10 Mbps** (1G-capable; far end non-UniFi, no LLDP) = a throttled segment, and the +**USW-Lite-8-PoE (172.16.1.12)** up but unmanaged. **Remote fixes:** re-adopted the .12 switch. +Created Syncro ticket **#32454** (Arizona Computer Guru, cust 15353550) with the full findings + +on-site dispatch note (power-cycle the .8 switch, check the port-24 SFP/fiber). The dead AC switch + +the SFP need on-site hands. + +**(2) Factorio mod — design discussion** (personal project). Researched Factorio 2.0 mod anatomy +(Grok live web; Gemini CLI auth is broken on this box). Mike wants a **deterministic, cost-based +quality system** replacing vanilla's RNG quality: a Refinery machine converts N lower-quality items +into 1 higher-quality item (no chance), keeping vanilla's quality tiers + perks untouched, with an +escalating compounding cost (≈3/4/5/6 per tier step). Captured the full design + the 2.0 mod-anatomy +reference under `projects/factorio-quality-mod/` (committed earlier this session, 30841fbf). + +## Key Decisions +- **pfSense exonerated by measurement, not assumption** — ran loss/latency/throughput/MTU/DNS/DHCP + before concluding; ruled out WAN, DHCP exhaustion, and MTU before pivoting downstream. +- **Diagnose from the user's actual machine** — adding Winter's PC to RMM let me prove the wired path + was clean and isolate the dead Wi-Fi adapter as the partial-load cause. +- **Disabled (not reconfigured) Winter's Wi-Fi adapter** — she's wired; killing the broken adapter + removes the bad-DNS interference cleanly and reversibly. +- **On-site dispatch via a Syncro ticket** for the physical items (dead AC PoE switch, 10M SFP) — + can't power-cycle a fully-offline AC switch or reseat an SFP remotely. +- **Factorio mod = full replacement of RNG quality**, keep vanilla tiers/perks, deterministic + Refinery, escalating compounding cost (Mike, 2026-06-23). + +## Problems Encountered +- **MSYS path conversion** mangled `/bin/sh` to a Windows path when passed to ssh.exe (Git-Bash + gotcha) — fixed with `MSYS_NO_PATHCONV=1` + a non-slash remote command (`sh -s`). +- **Gemini CLI auth broken** (`throwIneligibleOrProjectIdError` / `_doSetupUser`) — empty responses; + needs interactive `gemini` re-login. Logged to errorlog; fell back to Grok for research. +- **Old mongo shell rejected `else if`** in the uos-mongo query — rewrote with separate `if` blocks. +- **First post-reboot verify (CCroom1New, earlier)** read stale uptime in the reboot grace window — + re-verified after the box was down/up. +- **VWP-QBS firewall correction (carryover):** logged that the disabled firewall is intentional for + testing — leave it (don't re-flag). + +## Configuration Changes +- **DESKTOP-U303G5J (Winter):** disabled the `Wi-Fi` net adapter via RMM (`Disable-NetAdapter`); + Ethernet-only, DNS now 172.16.0.1/8.8.8.8/1.1.1.1. +- **UniFi 22nd St:** re-adopted USW-Lite-8-PoE 172.16.1.12 (device-control adopt, rc:ok). +- **Syncro #32454** created (Arizona Computer Guru 15353550) — findings + on-site dispatch comment. +- **New project** `projects/factorio-quality-mod/` — `DESIGN.md` + `factorio-mod-anatomy.md` (committed 30841fbf). +- No pfSense changes (read-only diagnostics only). + +## Credentials & Secrets +- None created. Read-only use of: office pfSense SSH key (`C:/Users/guru/.ssh/id_ed25519`, admin@172.16.0.1:2248); + `infrastructure/uos-server-ssh-key` (uos-mongo), `infrastructure/uos-server-network-api-rw` + (controller stat read), Syncro mike key, GuruRMM admin. pfSense admin pw vault + `infrastructure/pfsense-firewall` (unused — key auth worked). + +## Infrastructure & Servers +- **Office pfSense** 172.16.0.1 (SSH :2248, web :4433), pfSense 2.8.1, single Cox fiber WAN + (igc0 98.181.90.163/31, gw .162), flat LAN igc2 172.16.0.0/22, DHCP kea 172.16.1.1-254 (2 h). + Tailscale 100.119.153.74. Stale dead `OptGW` (184.187.220.90) still monitored (false alarms). +- **UniFi UOS** 172.16.3.29 (controller :11443; mongo via uos-server-ssh-key). Site "22nd St" = + `5f493a90c9e77c010bbb134c` (short `1p7jvx8r`). + - U7-Pro AP `172.16.1.34` (mac 28:70:4e:d5:36:cf) — **DOWN**. + - UAPA6A9 AP `172.16.1.63` — up. + - US-8-60W switch `172.16.1.8` (mac f4:e2:c6:e4:3e:dd) — **DOWN** (AC-powered). + - USW Pro 24 `172.16.1.11` — up; **Fiber Port 24 @ 10M** (10G SFP+ 25/26 unused). + - USW-Lite-8-PoE `172.16.1.12` — re-adopted. + - DMarc USMINI `172.16.1.15` — up. +- **Winter PC** DESKTOP-U303G5J 172.16.1.158, GuruRMM agent `52c90de1-6d58-4654-a6ed-b779b8ad93fc`. +- Syncro: Arizona Computer Guru cust **15353550** (7437 E 22nd St). + +## Commands & Outputs +- pfSense diag over SSH (key auth): `MSYS_NO_PATHCONV=1 ssh ... -p 2248 admin@172.16.0.1 sh -s < --apply`. +- Research: `ask-grok.sh text --prompt-file ...` (Gemini failed auth). + +## Pending / Incomplete Tasks +- **#32454 on-site (HIGH):** power-cycle/check US-8-60W (172.16.1.8) → restores U7-Pro AP + office Wi-Fi. +- **#32454 on-site:** inspect/reseat Fiber Port 24 SFP+fiber on USW Pro 24 (10M→1G); trace what it feeds. +- Optional: confirm the "USW Pro Max 16 PoE" (controller-offline/never-onboarded) is actually in service. +- VWP-QBS firewall stays OFF intentionally (do not re-enable) until VWP testing done. +- **Gemini CLI:** interactive re-login needed on GURU-5070 to restore that research path. +- Office pfSense cruft: remove the dead `OptGW` gateway (stops false down-alarms) — needs go (single point of failure). +- **Factorio mod:** next = feasibility check (deterministic quality-up recipe; neutralize quality-module RNG). See `projects/factorio-quality-mod/DESIGN.md`. + +## Reference Information +- Ticket #32454 (id 112996018) https://computerguru.syncromsp.com/tickets/112996018 (comment 420428386). +- #dev-alerts/#bot-alerts posts this session: Winter Wi-Fi, ticket #32454. +- Factorio project: `projects/factorio-quality-mod/{DESIGN.md,factorio-mod-anatomy.md}` (commit 30841fbf). +- Related earlier logs: clients/valleywide/.../2026-06-23-mike-vwp-smb1-orders-xp-g-drive.md; session-logs/2026-06/2026-06-23-mike-vwp-qbs-firewall-ccroom1new-uac.md.