diff --git a/clients/cascades-tucson/session-logs/2026-06/2026-06-17-howard-cs-server-drive-review-and-spike-question.md b/clients/cascades-tucson/session-logs/2026-06/2026-06-17-howard-cs-server-drive-review-and-spike-question.md new file mode 100644 index 00000000..6a50be84 --- /dev/null +++ b/clients/cascades-tucson/session-logs/2026-06/2026-06-17-howard-cs-server-drive-review-and-spike-question.md @@ -0,0 +1,94 @@ +# Cascades — CS-SERVER failing-drive review + noon network-spike question + +- **Date:** 2026-06-17 +- **Machine:** Howard-Home +- **Client:** Cascades of Tucson + +## User +- **User:** Howard Enos (howard) +- **Machine:** Howard-Home +- **Role:** tech + +## Session Summary + +Review/Q&A session on the CS-SERVER degraded RAID-1 (failing C: drive). Pulled up the +2026-06-15 CS-SERVER RAID/VPN session log and the client wiki hardware warning, and recapped +the state: C: (Virtual Disk2, RAID-1, ~297 GB usable) is DEGRADED on a single surviving 320 GB +5400 RPM laptop spindle after the mirror's other member failed. The healthy D: array and the +single-DC / newly-installed-but-unverified cloud backup posture were summarized. No changes +made to the server — this was an advisory/planning conversation. + +Answered three hardware questions for Howard ahead of a possible drive swap. (1) Hot-swap: yes, +the R610 hot-plug backplane + SAS 6/iR support hot-swapping the already-removed failed member +with the server running; gating items are a verified backup first and correct-drive +identification, not power state. (2) SSD limits: the SAS 6/iR (LSI 1068E, SATA II / 3 Gbps, no +TRIM, no Dell certified-drive lockout) sets the constraints — min size >= 320 GB raw (so 480/500 GB +class; 240/256 GB too small to rebuild), 2.5" SATA negotiating to 3 Gbps, enterprise SSD with +PLP (no TRIM => avoid consumer QLC/DRAM-less), <= 2 TB, 512e; buy two identical for the +rebuild-then-swap. (3) The two installed OS drives: surviving 0:0:2 Hitachi HTS545032B9A300 and +failed 0:0:3 WDC WD3200BEVT, both 320 GB SATA. + +Howard then asked what caused a network spike at Cascades today around noon. No spike is +recorded in the logs (that is live monitoring data), but the only network change today was the +VLAN 30 voice build (port-16 bounce + desktop re-DHCP onto 10.0.30.201, pfSense filter reload), +which is the leading correlation candidate; the larger phone cutover was deferred to tonight. +Per Howard's follow-up command, the next step is to pull the UniFi skill and investigate the +Cascades site (va6iba3v on UOS 172.16.3.29) live for the noon window — that investigation +begins after this save. + +## Key Decisions + +- Treated the failing-drive discussion as advisory only; no server-side commands run this + session (the standing rule is no drive work until the cloud backup completes and verifies). +- Recommended spec: two 480 GB enterprise 2.5" SATA SSDs with power-loss protection (e.g. + Solidigm D3-S4520 480 GB or Samsung PM893 480 GB) for the rebuild-then-swap. +- Offered to re-pull OMSA live before any physical swap to confirm the surviving Hitachi has not + also degraded since 2026-06-15. + +## Problems Encountered + +- None. Advisory session. + +## Configuration Changes + +- None on CS-SERVER or any infrastructure. Session log written only. + +## Credentials & Secrets + +- None created or discovered this session. + +## Infrastructure & Servers + +- **CS-SERVER** — Cascades DC/file/Hyper-V host, Dell PowerEdge R610 (~2009), Win Server 2019 + Std, 48 GB RAM. GuruRMM agent `c39f1de7-d5b6-45ae-b132-e06977ab1713`. LAN 192.168.2.254. + - C: = Virtual Disk2, RAID-1, ~297 GB, DEGRADED. Surviving member `0:0:2` Hitachi + HTS545032B9A300 (320 GB SATA, 5400 RPM). Failed member `0:0:3` WDC WD3200BEVT (320 GB SATA, + Critical/Removed). + - D: = Virtual Disk0, RAID-1, two 1.2 TB SAS, OK. Spare `1:0:4` 1.2 TB SAS "Ready" (wrong size + to rebuild the 320 GB mirror). + - Controller: SAS 6/iR Integrated (LSI 1068E), SATA II 3 Gbps, no TRIM, no certified-drive + lockout. + - Cloud backup (MSP360/CloudBerry -> ACG-backup) installed/started 2026-06-15, not yet verified. +- **Cascades UniFi** — site `va6iba3v` on UOS controller `172.16.3.29` (for the spike + investigation). +- **Cascades pfSense** — `192.168.0.1`. + +## Commands & Outputs + +- No commands run against infrastructure this session. + +## Pending / Incomplete Tasks + +- **Investigate the noon network spike** via the UniFi skill (Cascades site va6iba3v on UOS + 172.16.3.29) + optionally pfSense WAN throughput — begins after this save. +- **CS-SERVER drive remediation** still gated on: backup first full completes + verifies + + confirmed image/bare-metal + retention set. Then rebuild-then-swap to 2x 480 GB enterprise + SATA SSDs. Re-pull OMSA live before any physical action. +- DC migration off the EOL R610 remains the strategic fix. + +## Reference Information + +- Prior drive session: `clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cs-server-raid-vpn-reset.md` +- Today's VLAN build: `clients/cascades-tucson/session-logs/2026-06/2026-06-17-howard-voice-vlan30-build.md` +- Client wiki: `wiki/clients/cascades-tucson.md` +- GuruRMM agent (CS-SERVER): `c39f1de7-d5b6-45ae-b132-e06977ab1713`. RMM API http://172.16.3.30:3001.