sync: Auto-sync from ACG-M-L5090 at 2026-03-10 19:11:00
Synced files: - Quote wizard frontend (all components, hooks, types, config) - API updates (config, models, routers, schemas, services) - Client work (bg-builders, gurushow) - Scripts (BGB Lesley termination, CIPP, Datto, migration) - Temp files (Bardach contacts, VWP investigation, misc) - Credentials and session logs - Email service, PHP API, session logs Machine: ACG-M-L5090 Timestamp: 2026-03-10 19:11:00 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
158
session-logs/2026-02-25-session.md
Normal file
158
session-logs/2026-02-25-session.md
Normal file
@@ -0,0 +1,158 @@
|
||||
# Session Log: 2026-02-25
|
||||
|
||||
## Session Summary
|
||||
|
||||
Continued diagnostics on Peaceful Spirit Country Club UCG Ultra speed issues. Performed SSH-based monitoring, identified ECM crash-loop patterns, rebooted gateway, and ran 15-minute stability monitoring. Gateway fully exonerated -- issue confirmed as Cox plant-side.
|
||||
|
||||
---
|
||||
|
||||
## Peaceful Spirit Country Club - UCG Ultra Continued Diagnostics
|
||||
|
||||
### Pre-Reboot Findings (via SSH)
|
||||
|
||||
Connected via VPN to 192.168.0.10 after fixing SSH key (had to add to `/root/.ssh/authorized_keys` directly -- GUI-added key required password).
|
||||
|
||||
**ECM crash-loop confirmed ongoing:**
|
||||
- ECM was NOT loaded (`lsmod | grep ecm` = empty)
|
||||
- Cycle pattern from dmesg: runs 2-6 minutes, crashes, stays down 15-39 minutes
|
||||
- Last cycle before reboot: init at 89499s, exit at 89638s (~2 min run), then never reloaded
|
||||
|
||||
**Other findings:**
|
||||
- Load average: 1.26 (elevated, CPU handling all forwarding without ECM)
|
||||
- Memory: 1169 MB / 2947 MB (40%), 65 MB swap used
|
||||
- IDS/IPS: confirmed OFF (no suricata process)
|
||||
- eth4 RX: 4 errors, 4 CRC errors (physical layer corruption from modem)
|
||||
- WAN link flap: eth4 went down for 6 seconds at 76591s (modem sync loss)
|
||||
- QUIC reassembly failures: multiple bursts, including triple failure at 96270s
|
||||
- WireGuard tunnel: down (VPN was hung, had to be restarted on our side)
|
||||
|
||||
### Reboot and Hardware Acceleration
|
||||
|
||||
User rebooted UCG Ultra. Initial post-reboot check (7 min uptime):
|
||||
- ECM was NOT loaded -- initially suspected PCIe probe failure (`qcom-pcie: probe of 20000000.pcie failed with error -110`)
|
||||
- Actual cause: **Hardware Acceleration was disabled in UI settings**
|
||||
- User re-enabled Hardware Acceleration
|
||||
- ECM loaded immediately: `ECM init` at 669s, `ECM init complete` at 669s
|
||||
|
||||
### 15-Minute Stability Monitoring
|
||||
|
||||
Ran automated check every 60 seconds for 15 minutes (08:24 - 08:39).
|
||||
|
||||
**Results:**
|
||||
- ECM: STABLE for entire 15 minutes -- zero crashes, zero restarts
|
||||
- RX errors: 0 across all 15 checks
|
||||
- CRC errors: 0 across all 15 checks
|
||||
- Drops: 0 both directions
|
||||
- QUIC failures: 0
|
||||
- Link flaps: 0
|
||||
- dmesg: clean -- only the initial ECM init message
|
||||
|
||||
**Load trend:**
|
||||
| Time | Load (1m) | Load (5m) | Load (15m) |
|
||||
|------|-----------|-----------|------------|
|
||||
| 08:24 | 1.53 | 1.43 | 0.92 |
|
||||
| 08:28 | 1.33 | 1.43 | 1.04 |
|
||||
| 08:32 | 1.74 | 1.57 | 1.19 |
|
||||
| 08:36 | 2.12 | 1.72 | 1.33 |
|
||||
| 08:38 | 2.32 | 1.80 | 1.38 |
|
||||
| 08:39 | 1.74 | 1.73 | 1.38 |
|
||||
|
||||
Load persistently above 1.0 -- likely WireGuard VPN crypto (can't be offloaded to ECM).
|
||||
|
||||
### Configuration Changes Made
|
||||
|
||||
1. **IDS/IPS:** Disabled (was on High) -- done 2026-02-25 earlier
|
||||
2. **Hardware Acceleration:** Re-enabled after reboot
|
||||
3. **MSS Clamping:** Changed from Custom 1452 to Auto
|
||||
- iptables now shows `clamp to PMTU` on tun1 only (correct behavior)
|
||||
- No MSS rules on eth4/WAN (confirmed -- MSS setting never affected WAN traffic)
|
||||
|
||||
### Speed Test Results
|
||||
|
||||
- Post-reboot with ECM running: **29/28 Mbps** (300/30 provisioned)
|
||||
- Upload hitting near-provisioned speed (28 of 30)
|
||||
- Download at ~10% of provisioned (29 of 300)
|
||||
- Occasionally achieves full provisioned speeds (200-278 Mbps seen previously)
|
||||
|
||||
### Final Status Check (08:41, 33 min uptime)
|
||||
|
||||
- ECM: loaded, stable
|
||||
- Load: 1.25 (trending down)
|
||||
- Memory: 981 MB / 2947 MB (33%), 2 MB swap
|
||||
- eth4: 0 errors, 0 CRC, 0 drops
|
||||
- dmesg: clean since boot
|
||||
- MSS: Auto, clamp to PMTU on tun1 only
|
||||
|
||||
### Sequential Thinking Re-Evaluation
|
||||
|
||||
Performed full sequential thinking analysis (8 steps) re-evaluating all evidence:
|
||||
|
||||
**Two overlapping problems identified:**
|
||||
|
||||
**Problem 1 - Cox Plant (Primary):**
|
||||
- Speed decays from 200+ to 70 Mbps under sustained load = marginal DOCSIS channels de-bonding
|
||||
- 50% packet loss at all packet sizes = not MTU or gateway related
|
||||
- Download degraded, upload stable = downstream RF path
|
||||
- New modem, same symptoms = rules out CPE
|
||||
- Persists with all gateway configurations tested
|
||||
- Occasionally hits provisioned speed = CMTS config is correct, channels are marginal
|
||||
|
||||
**Problem 2 - Gateway ECM (Secondary, resolved):**
|
||||
- ECM crash-loop amplified plant symptoms (caused <1 Mbps drops)
|
||||
- Resolved by: disabling IDS/IPS, rebooting, re-enabling HW acceleration
|
||||
- 15-minute monitoring confirms stable operation
|
||||
|
||||
### Summary Prepared for Cox Tech
|
||||
|
||||
> **Site:** Peaceful Spirit Country Club
|
||||
> **Circuit:** 300/30 Mbps | **IP:** 98.190.129.150
|
||||
> **Modem:** New (replaced prior day) - same symptoms
|
||||
>
|
||||
> Download speeds start at 200+ then decay to 29-70 Mbps under load. Intermittent drops to <1 Mbps. 50% packet loss at all sizes. Upload stable at 28-29 Mbps. Modem intermittently achieves full provisioned speed, proving CMTS config is correct.
|
||||
>
|
||||
> Customer gateway fully eliminated: 15 min monitoring shows zero errors at every layer, hardware offload stable, zero CRC errors.
|
||||
>
|
||||
> Pattern consistent with marginal downstream DOCSIS channels bonding/de-bonding as signal conditions fluctuate.
|
||||
>
|
||||
> Tech should check: downstream signal levels/SNR, uncorrectable codewords, T3/T4 timeouts, tap/drop/connectors for corrosion, amplifier health, node health.
|
||||
|
||||
---
|
||||
|
||||
## SSH Access Reference
|
||||
|
||||
- **Host:** 192.168.0.10 (via VPN) or 98.190.129.150 (WAN)
|
||||
- **User:** root
|
||||
- **Key:** `~/.ssh/ucg_peaceful_spirit` (ed25519)
|
||||
- **Public key:** `ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKBw+BK25MXpm91XBtDsSp7K0nTcKwFDLFZDx7tAO/N8 claude@claudetools`
|
||||
- **Auth method:** Key added to `/root/.ssh/authorized_keys` (NOT via UniFi GUI)
|
||||
- **Note:** GUI-added keys require password; direct authorized_keys works with key-only
|
||||
|
||||
### Current UCG Config (post-changes)
|
||||
|
||||
- Hardware Acceleration: ON
|
||||
- IDS/IPS: Disabled
|
||||
- MSS Clamping: Auto (clamp to PMTU on VPN tunnels)
|
||||
- Jumbo Frames: OFF
|
||||
- SNMP: OFF
|
||||
- ARP Cache: Min DHCP lease
|
||||
- Auto Firewall State Timeouts: ON
|
||||
- Global NAT: Auto
|
||||
- Connection Tracking: FTP, H.323, GRE, PPTP, TFTP
|
||||
|
||||
---
|
||||
|
||||
## Pending Tasks
|
||||
|
||||
### Peaceful Spirit
|
||||
- [ ] Cox tech visit -- confirm plant-side fix resolves speed issues
|
||||
- [ ] After Cox fix: re-test speeds to verify 300 Mbps sustained
|
||||
- [ ] Consider re-enabling IDS/IPS at Medium/Low after Cox plant is fixed
|
||||
- [ ] Monitor ECM stability over coming days
|
||||
- [ ] Investigate persistent high load (1.2-2.3) -- likely WireGuard related
|
||||
|
||||
### From Previous Session (2026-02-24)
|
||||
- [ ] Yealink: Get IP Discovery Tool from distributor for serial extraction
|
||||
- [ ] Yealink: Test browser-based scanner (tools/yealink-serial-scanner.html)
|
||||
- [ ] Yealink: Onboard remaining phones into YMCS
|
||||
- [ ] Yealink: Build OIT VoIP templates when ready for migration
|
||||
- [ ] Clean up tools/test-yealink.ps1
|
||||
Reference in New Issue
Block a user