Files
claudetools/session-logs/2026-02-25-session.md
Mike Swanson fa15b03180 sync: Auto-sync from ACG-M-L5090 at 2026-03-10 19:11:00
Synced files:
- Quote wizard frontend (all components, hooks, types, config)
- API updates (config, models, routers, schemas, services)
- Client work (bg-builders, gurushow)
- Scripts (BGB Lesley termination, CIPP, Datto, migration)
- Temp files (Bardach contacts, VWP investigation, misc)
- Credentials and session logs
- Email service, PHP API, session logs

Machine: ACG-M-L5090
Timestamp: 2026-03-10 19:11:00

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 19:59:08 -07:00

6.4 KiB

Session Log: 2026-02-25

Session Summary

Continued diagnostics on Peaceful Spirit Country Club UCG Ultra speed issues. Performed SSH-based monitoring, identified ECM crash-loop patterns, rebooted gateway, and ran 15-minute stability monitoring. Gateway fully exonerated -- issue confirmed as Cox plant-side.


Peaceful Spirit Country Club - UCG Ultra Continued Diagnostics

Pre-Reboot Findings (via SSH)

Connected via VPN to 192.168.0.10 after fixing SSH key (had to add to /root/.ssh/authorized_keys directly -- GUI-added key required password).

ECM crash-loop confirmed ongoing:

  • ECM was NOT loaded (lsmod | grep ecm = empty)
  • Cycle pattern from dmesg: runs 2-6 minutes, crashes, stays down 15-39 minutes
  • Last cycle before reboot: init at 89499s, exit at 89638s (~2 min run), then never reloaded

Other findings:

  • Load average: 1.26 (elevated, CPU handling all forwarding without ECM)
  • Memory: 1169 MB / 2947 MB (40%), 65 MB swap used
  • IDS/IPS: confirmed OFF (no suricata process)
  • eth4 RX: 4 errors, 4 CRC errors (physical layer corruption from modem)
  • WAN link flap: eth4 went down for 6 seconds at 76591s (modem sync loss)
  • QUIC reassembly failures: multiple bursts, including triple failure at 96270s
  • WireGuard tunnel: down (VPN was hung, had to be restarted on our side)

Reboot and Hardware Acceleration

User rebooted UCG Ultra. Initial post-reboot check (7 min uptime):

  • ECM was NOT loaded -- initially suspected PCIe probe failure (qcom-pcie: probe of 20000000.pcie failed with error -110)
  • Actual cause: Hardware Acceleration was disabled in UI settings
  • User re-enabled Hardware Acceleration
  • ECM loaded immediately: ECM init at 669s, ECM init complete at 669s

15-Minute Stability Monitoring

Ran automated check every 60 seconds for 15 minutes (08:24 - 08:39).

Results:

  • ECM: STABLE for entire 15 minutes -- zero crashes, zero restarts
  • RX errors: 0 across all 15 checks
  • CRC errors: 0 across all 15 checks
  • Drops: 0 both directions
  • QUIC failures: 0
  • Link flaps: 0
  • dmesg: clean -- only the initial ECM init message

Load trend:

Time Load (1m) Load (5m) Load (15m)
08:24 1.53 1.43 0.92
08:28 1.33 1.43 1.04
08:32 1.74 1.57 1.19
08:36 2.12 1.72 1.33
08:38 2.32 1.80 1.38
08:39 1.74 1.73 1.38

Load persistently above 1.0 -- likely WireGuard VPN crypto (can't be offloaded to ECM).

Configuration Changes Made

  1. IDS/IPS: Disabled (was on High) -- done 2026-02-25 earlier
  2. Hardware Acceleration: Re-enabled after reboot
  3. MSS Clamping: Changed from Custom 1452 to Auto
    • iptables now shows clamp to PMTU on tun1 only (correct behavior)
    • No MSS rules on eth4/WAN (confirmed -- MSS setting never affected WAN traffic)

Speed Test Results

  • Post-reboot with ECM running: 29/28 Mbps (300/30 provisioned)
  • Upload hitting near-provisioned speed (28 of 30)
  • Download at ~10% of provisioned (29 of 300)
  • Occasionally achieves full provisioned speeds (200-278 Mbps seen previously)

Final Status Check (08:41, 33 min uptime)

  • ECM: loaded, stable
  • Load: 1.25 (trending down)
  • Memory: 981 MB / 2947 MB (33%), 2 MB swap
  • eth4: 0 errors, 0 CRC, 0 drops
  • dmesg: clean since boot
  • MSS: Auto, clamp to PMTU on tun1 only

Sequential Thinking Re-Evaluation

Performed full sequential thinking analysis (8 steps) re-evaluating all evidence:

Two overlapping problems identified:

Problem 1 - Cox Plant (Primary):

  • Speed decays from 200+ to 70 Mbps under sustained load = marginal DOCSIS channels de-bonding
  • 50% packet loss at all packet sizes = not MTU or gateway related
  • Download degraded, upload stable = downstream RF path
  • New modem, same symptoms = rules out CPE
  • Persists with all gateway configurations tested
  • Occasionally hits provisioned speed = CMTS config is correct, channels are marginal

Problem 2 - Gateway ECM (Secondary, resolved):

  • ECM crash-loop amplified plant symptoms (caused <1 Mbps drops)
  • Resolved by: disabling IDS/IPS, rebooting, re-enabling HW acceleration
  • 15-minute monitoring confirms stable operation

Summary Prepared for Cox Tech

Site: Peaceful Spirit Country Club Circuit: 300/30 Mbps | IP: 98.190.129.150 Modem: New (replaced prior day) - same symptoms

Download speeds start at 200+ then decay to 29-70 Mbps under load. Intermittent drops to <1 Mbps. 50% packet loss at all sizes. Upload stable at 28-29 Mbps. Modem intermittently achieves full provisioned speed, proving CMTS config is correct.

Customer gateway fully eliminated: 15 min monitoring shows zero errors at every layer, hardware offload stable, zero CRC errors.

Pattern consistent with marginal downstream DOCSIS channels bonding/de-bonding as signal conditions fluctuate.

Tech should check: downstream signal levels/SNR, uncorrectable codewords, T3/T4 timeouts, tap/drop/connectors for corrosion, amplifier health, node health.


SSH Access Reference

  • Host: 192.168.0.10 (via VPN) or 98.190.129.150 (WAN)
  • User: root
  • Key: ~/.ssh/ucg_peaceful_spirit (ed25519)
  • Public key: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKBw+BK25MXpm91XBtDsSp7K0nTcKwFDLFZDx7tAO/N8 claude@claudetools
  • Auth method: Key added to /root/.ssh/authorized_keys (NOT via UniFi GUI)
  • Note: GUI-added keys require password; direct authorized_keys works with key-only

Current UCG Config (post-changes)

  • Hardware Acceleration: ON
  • IDS/IPS: Disabled
  • MSS Clamping: Auto (clamp to PMTU on VPN tunnels)
  • Jumbo Frames: OFF
  • SNMP: OFF
  • ARP Cache: Min DHCP lease
  • Auto Firewall State Timeouts: ON
  • Global NAT: Auto
  • Connection Tracking: FTP, H.323, GRE, PPTP, TFTP

Pending Tasks

Peaceful Spirit

  • Cox tech visit -- confirm plant-side fix resolves speed issues
  • After Cox fix: re-test speeds to verify 300 Mbps sustained
  • Consider re-enabling IDS/IPS at Medium/Low after Cox plant is fixed
  • Monitor ECM stability over coming days
  • Investigate persistent high load (1.2-2.3) -- likely WireGuard related

From Previous Session (2026-02-24)

  • Yealink: Get IP Discovery Tool from distributor for serial extraction
  • Yealink: Test browser-based scanner (tools/yealink-serial-scanner.html)
  • Yealink: Onboard remaining phones into YMCS
  • Yealink: Build OIT VoIP templates when ready for migration
  • Clean up tools/test-yealink.ps1