sync: auto-sync from HOWARD-HOME at 2026-06-17 15:47:50

Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-17 15:47:50
2026-06-17 15:47:59 -07:00
parent 0166f1db64
commit cbe7175fbb
1 changed files with 94 additions and 0 deletions
--- a/clients/cascades-tucson/session-logs/2026-06/2026-06-17-howard-cs-server-drive-review-and-spike-question.md
+++ b/clients/cascades-tucson/session-logs/2026-06/2026-06-17-howard-cs-server-drive-review-and-spike-question.md
@@ -0,0 +1,94 @@
+# Cascades — CS-SERVER failing-drive review + noon network-spike question
+
+- **Date:** 2026-06-17
+- **Machine:** Howard-Home
+- **Client:** Cascades of Tucson
+
+## User
+- **User:** Howard Enos (howard)
+- **Machine:** Howard-Home
+- **Role:** tech
+
+## Session Summary
+
+Review/Q&A session on the CS-SERVER degraded RAID-1 (failing C: drive). Pulled up the
+2026-06-15 CS-SERVER RAID/VPN session log and the client wiki hardware warning, and recapped
+the state: C: (Virtual Disk2, RAID-1, ~297 GB usable) is DEGRADED on a single surviving 320 GB
+5400 RPM laptop spindle after the mirror's other member failed. The healthy D: array and the
+single-DC / newly-installed-but-unverified cloud backup posture were summarized. No changes
+made to the server — this was an advisory/planning conversation.
+
+Answered three hardware questions for Howard ahead of a possible drive swap. (1) Hot-swap: yes,
+the R610 hot-plug backplane + SAS 6/iR support hot-swapping the already-removed failed member
+with the server running; gating items are a verified backup first and correct-drive
+identification, not power state. (2) SSD limits: the SAS 6/iR (LSI 1068E, SATA II / 3 Gbps, no
+TRIM, no Dell certified-drive lockout) sets the constraints — min size >= 320 GB raw (so 480/500 GB
+class; 240/256 GB too small to rebuild), 2.5" SATA negotiating to 3 Gbps, enterprise SSD with
+PLP (no TRIM => avoid consumer QLC/DRAM-less), <= 2 TB, 512e; buy two identical for the
+rebuild-then-swap. (3) The two installed OS drives: surviving 0:0:2 Hitachi HTS545032B9A300 and
+failed 0:0:3 WDC WD3200BEVT, both 320 GB SATA.
+
+Howard then asked what caused a network spike at Cascades today around noon. No spike is
+recorded in the logs (that is live monitoring data), but the only network change today was the
+VLAN 30 voice build (port-16 bounce + desktop re-DHCP onto 10.0.30.201, pfSense filter reload),
+which is the leading correlation candidate; the larger phone cutover was deferred to tonight.
+Per Howard's follow-up command, the next step is to pull the UniFi skill and investigate the
+Cascades site (va6iba3v on UOS 172.16.3.29) live for the noon window — that investigation
+begins after this save.
+
+## Key Decisions
+
+- Treated the failing-drive discussion as advisory only; no server-side commands run this
+  session (the standing rule is no drive work until the cloud backup completes and verifies).
+- Recommended spec: two 480 GB enterprise 2.5" SATA SSDs with power-loss protection (e.g.
+  Solidigm D3-S4520 480 GB or Samsung PM893 480 GB) for the rebuild-then-swap.
+- Offered to re-pull OMSA live before any physical swap to confirm the surviving Hitachi has not
+  also degraded since 2026-06-15.
+
+## Problems Encountered
+
+- None. Advisory session.
+
+## Configuration Changes
+
+- None on CS-SERVER or any infrastructure. Session log written only.
+
+## Credentials & Secrets
+
+- None created or discovered this session.
+
+## Infrastructure & Servers
+
+- **CS-SERVER** — Cascades DC/file/Hyper-V host, Dell PowerEdge R610 (~2009), Win Server 2019
+  Std, 48 GB RAM. GuruRMM agent `c39f1de7-d5b6-45ae-b132-e06977ab1713`. LAN 192.168.2.254.
+  - C: = Virtual Disk2, RAID-1, ~297 GB, DEGRADED. Surviving member `0:0:2` Hitachi
+    HTS545032B9A300 (320 GB SATA, 5400 RPM). Failed member `0:0:3` WDC WD3200BEVT (320 GB SATA,
+    Critical/Removed).
+  - D: = Virtual Disk0, RAID-1, two 1.2 TB SAS, OK. Spare `1:0:4` 1.2 TB SAS "Ready" (wrong size
+    to rebuild the 320 GB mirror).
+  - Controller: SAS 6/iR Integrated (LSI 1068E), SATA II 3 Gbps, no TRIM, no certified-drive
+    lockout.
+  - Cloud backup (MSP360/CloudBerry -> ACG-backup) installed/started 2026-06-15, not yet verified.
+- **Cascades UniFi** — site `va6iba3v` on UOS controller `172.16.3.29` (for the spike
+  investigation).
+- **Cascades pfSense** — `192.168.0.1`.
+
+## Commands & Outputs
+
+- No commands run against infrastructure this session.
+
+## Pending / Incomplete Tasks
+
+- **Investigate the noon network spike** via the UniFi skill (Cascades site va6iba3v on UOS
+  172.16.3.29) + optionally pfSense WAN throughput — begins after this save.
+- **CS-SERVER drive remediation** still gated on: backup first full completes + verifies +
+  confirmed image/bare-metal + retention set. Then rebuild-then-swap to 2x 480 GB enterprise
+  SATA SSDs. Re-pull OMSA live before any physical action.
+- DC migration off the EOL R610 remains the strategic fix.
+
+## Reference Information
+
+- Prior drive session: `clients/cascades-tucson/session-logs/2026-06/2026-06-15-howard-cs-server-raid-vpn-reset.md`
+- Today's VLAN build: `clients/cascades-tucson/session-logs/2026-06/2026-06-17-howard-voice-vlan30-build.md`
+- Client wiki: `wiki/clients/cascades-tucson.md`
+- GuruRMM agent (CS-SERVER): `c39f1de7-d5b6-45ae-b132-e06977ab1713`. RMM API http://172.16.3.30:3001.