sync: Auto-sync from ACG-M-L5090 at 2026-03-10 19:11:00

Synced files:
- Quote wizard frontend (all components, hooks, types, config)
- API updates (config, models, routers, schemas, services)
- Client work (bg-builders, gurushow)
- Scripts (BGB Lesley termination, CIPP, Datto, migration)
- Temp files (Bardach contacts, VWP investigation, misc)
- Credentials and session logs
- Email service, PHP API, session logs

Machine: ACG-M-L5090
Timestamp: 2026-03-10 19:11:00

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-10 19:59:08 -07:00
parent a1a19f8c00
commit fa15b03180
169 changed files with 879909 additions and 1243 deletions

View File

@@ -0,0 +1,158 @@
# Session Log: 2026-02-25
## Session Summary
Continued diagnostics on Peaceful Spirit Country Club UCG Ultra speed issues. Performed SSH-based monitoring, identified ECM crash-loop patterns, rebooted gateway, and ran 15-minute stability monitoring. Gateway fully exonerated -- issue confirmed as Cox plant-side.
---
## Peaceful Spirit Country Club - UCG Ultra Continued Diagnostics
### Pre-Reboot Findings (via SSH)
Connected via VPN to 192.168.0.10 after fixing SSH key (had to add to `/root/.ssh/authorized_keys` directly -- GUI-added key required password).
**ECM crash-loop confirmed ongoing:**
- ECM was NOT loaded (`lsmod | grep ecm` = empty)
- Cycle pattern from dmesg: runs 2-6 minutes, crashes, stays down 15-39 minutes
- Last cycle before reboot: init at 89499s, exit at 89638s (~2 min run), then never reloaded
**Other findings:**
- Load average: 1.26 (elevated, CPU handling all forwarding without ECM)
- Memory: 1169 MB / 2947 MB (40%), 65 MB swap used
- IDS/IPS: confirmed OFF (no suricata process)
- eth4 RX: 4 errors, 4 CRC errors (physical layer corruption from modem)
- WAN link flap: eth4 went down for 6 seconds at 76591s (modem sync loss)
- QUIC reassembly failures: multiple bursts, including triple failure at 96270s
- WireGuard tunnel: down (VPN was hung, had to be restarted on our side)
### Reboot and Hardware Acceleration
User rebooted UCG Ultra. Initial post-reboot check (7 min uptime):
- ECM was NOT loaded -- initially suspected PCIe probe failure (`qcom-pcie: probe of 20000000.pcie failed with error -110`)
- Actual cause: **Hardware Acceleration was disabled in UI settings**
- User re-enabled Hardware Acceleration
- ECM loaded immediately: `ECM init` at 669s, `ECM init complete` at 669s
### 15-Minute Stability Monitoring
Ran automated check every 60 seconds for 15 minutes (08:24 - 08:39).
**Results:**
- ECM: STABLE for entire 15 minutes -- zero crashes, zero restarts
- RX errors: 0 across all 15 checks
- CRC errors: 0 across all 15 checks
- Drops: 0 both directions
- QUIC failures: 0
- Link flaps: 0
- dmesg: clean -- only the initial ECM init message
**Load trend:**
| Time | Load (1m) | Load (5m) | Load (15m) |
|------|-----------|-----------|------------|
| 08:24 | 1.53 | 1.43 | 0.92 |
| 08:28 | 1.33 | 1.43 | 1.04 |
| 08:32 | 1.74 | 1.57 | 1.19 |
| 08:36 | 2.12 | 1.72 | 1.33 |
| 08:38 | 2.32 | 1.80 | 1.38 |
| 08:39 | 1.74 | 1.73 | 1.38 |
Load persistently above 1.0 -- likely WireGuard VPN crypto (can't be offloaded to ECM).
### Configuration Changes Made
1. **IDS/IPS:** Disabled (was on High) -- done 2026-02-25 earlier
2. **Hardware Acceleration:** Re-enabled after reboot
3. **MSS Clamping:** Changed from Custom 1452 to Auto
- iptables now shows `clamp to PMTU` on tun1 only (correct behavior)
- No MSS rules on eth4/WAN (confirmed -- MSS setting never affected WAN traffic)
### Speed Test Results
- Post-reboot with ECM running: **29/28 Mbps** (300/30 provisioned)
- Upload hitting near-provisioned speed (28 of 30)
- Download at ~10% of provisioned (29 of 300)
- Occasionally achieves full provisioned speeds (200-278 Mbps seen previously)
### Final Status Check (08:41, 33 min uptime)
- ECM: loaded, stable
- Load: 1.25 (trending down)
- Memory: 981 MB / 2947 MB (33%), 2 MB swap
- eth4: 0 errors, 0 CRC, 0 drops
- dmesg: clean since boot
- MSS: Auto, clamp to PMTU on tun1 only
### Sequential Thinking Re-Evaluation
Performed full sequential thinking analysis (8 steps) re-evaluating all evidence:
**Two overlapping problems identified:**
**Problem 1 - Cox Plant (Primary):**
- Speed decays from 200+ to 70 Mbps under sustained load = marginal DOCSIS channels de-bonding
- 50% packet loss at all packet sizes = not MTU or gateway related
- Download degraded, upload stable = downstream RF path
- New modem, same symptoms = rules out CPE
- Persists with all gateway configurations tested
- Occasionally hits provisioned speed = CMTS config is correct, channels are marginal
**Problem 2 - Gateway ECM (Secondary, resolved):**
- ECM crash-loop amplified plant symptoms (caused <1 Mbps drops)
- Resolved by: disabling IDS/IPS, rebooting, re-enabling HW acceleration
- 15-minute monitoring confirms stable operation
### Summary Prepared for Cox Tech
> **Site:** Peaceful Spirit Country Club
> **Circuit:** 300/30 Mbps | **IP:** 98.190.129.150
> **Modem:** New (replaced prior day) - same symptoms
>
> Download speeds start at 200+ then decay to 29-70 Mbps under load. Intermittent drops to <1 Mbps. 50% packet loss at all sizes. Upload stable at 28-29 Mbps. Modem intermittently achieves full provisioned speed, proving CMTS config is correct.
>
> Customer gateway fully eliminated: 15 min monitoring shows zero errors at every layer, hardware offload stable, zero CRC errors.
>
> Pattern consistent with marginal downstream DOCSIS channels bonding/de-bonding as signal conditions fluctuate.
>
> Tech should check: downstream signal levels/SNR, uncorrectable codewords, T3/T4 timeouts, tap/drop/connectors for corrosion, amplifier health, node health.
---
## SSH Access Reference
- **Host:** 192.168.0.10 (via VPN) or 98.190.129.150 (WAN)
- **User:** root
- **Key:** `~/.ssh/ucg_peaceful_spirit` (ed25519)
- **Public key:** `ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKBw+BK25MXpm91XBtDsSp7K0nTcKwFDLFZDx7tAO/N8 claude@claudetools`
- **Auth method:** Key added to `/root/.ssh/authorized_keys` (NOT via UniFi GUI)
- **Note:** GUI-added keys require password; direct authorized_keys works with key-only
### Current UCG Config (post-changes)
- Hardware Acceleration: ON
- IDS/IPS: Disabled
- MSS Clamping: Auto (clamp to PMTU on VPN tunnels)
- Jumbo Frames: OFF
- SNMP: OFF
- ARP Cache: Min DHCP lease
- Auto Firewall State Timeouts: ON
- Global NAT: Auto
- Connection Tracking: FTP, H.323, GRE, PPTP, TFTP
---
## Pending Tasks
### Peaceful Spirit
- [ ] Cox tech visit -- confirm plant-side fix resolves speed issues
- [ ] After Cox fix: re-test speeds to verify 300 Mbps sustained
- [ ] Consider re-enabling IDS/IPS at Medium/Low after Cox plant is fixed
- [ ] Monitor ECM stability over coming days
- [ ] Investigate persistent high load (1.2-2.3) -- likely WireGuard related
### From Previous Session (2026-02-24)
- [ ] Yealink: Get IP Discovery Tool from distributor for serial extraction
- [ ] Yealink: Test browser-based scanner (tools/yealink-serial-scanner.html)
- [ ] Yealink: Onboard remaining phones into YMCS
- [ ] Yealink: Build OIT VoIP templates when ready for migration
- [ ] Clean up tools/test-yealink.ps1