docs: Valleywide XenServer OFFLINE - critical investigation
Updated emergency session log with XenServer offline status: - XenServer (older Dell) offline - investigating - Server3 VM unavailable - Added to critical next steps Server status summary: - HP ProLiant (MXQ80400X4): NVRAM fixed, VMs running, iLO pending - Dell VWP-QBS: Boot retry resolved, operational - XenServer: OFFLINE (CRITICAL) - 4th server: appears fine Power outage impact assessment ongoing. Timer running. Machine: Mikes-MacBook-Air.local Timestamp: 2026-04-22 10:23:23 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -17,12 +17,14 @@
|
|||||||
- SSH enabled (OpenSSH Server), key auth working for `vwp\guru`
|
- SSH enabled (OpenSSH Server), key auth working for `vwp\guru`
|
||||||
- Likely runs as VM on HP ProLiant host
|
- Likely runs as VM on HP ProLiant host
|
||||||
|
|
||||||
**VWP-QBS (172.16.9.169)**
|
**VWP-QBS (172.16.9.169) - Dell Server with DRAC**
|
||||||
- Windows Server 2022 Standard
|
- Windows Server 2022 Standard
|
||||||
|
- **Physical Dell server** (NOT a VM)
|
||||||
- Internal network only (172.16.9.0/24 reachable via VWP site VPN)
|
- Internal network only (172.16.9.0/24 reachable via VWP site VPN)
|
||||||
- Runs QuickBooks + **IIS with RD Gateway / RD Web Access** (`/RDWeb`, `/RDWeb/Pages`, `/RDWeb/Feed`, `/Rpc`, `/RpcWithCert`)
|
- Runs QuickBooks + **IIS with RD Gateway / RD Web Access** (`/RDWeb`, `/RDWeb/Pages`, `/RDWeb/Feed`, `/Rpc`, `/RpcWithCert`)
|
||||||
- WinRM available on 5985 (used for remote admin via Invoke-Command)
|
- WinRM available on 5985 (used for remote admin via Invoke-Command)
|
||||||
- Likely runs as VM on HP ProLiant host
|
- DRAC available for remote management
|
||||||
|
- [NOTE] 2026-04-22: Boot retry issue after power outage, resolved via DRAC manual boot to Windows Boot Manager
|
||||||
|
|
||||||
### Networks
|
### Networks
|
||||||
- Internal: `172.16.9.0/24`
|
- Internal: `172.16.9.0/24`
|
||||||
|
|||||||
@@ -13,7 +13,10 @@
|
|||||||
|
|
||||||
## Issue Summary
|
## Issue Summary
|
||||||
|
|
||||||
HP ProLiant Server (SN: MXQ80400X4) experienced non-volatile memory corruption following power outage. BIOS/UEFI settings lost, requiring factory reset and full reconfiguration.
|
Multiple server issues following power outage at Valleywide:
|
||||||
|
- HP ProLiant Server (SN: MXQ80400X4): Non-volatile memory corruption, BIOS/iLO reset required
|
||||||
|
- Dell Server (VWP-QBS): Boot retry loop, resolved via DRAC manual boot
|
||||||
|
- **XenServer (Older Dell): OFFLINE - investigating (CRITICAL)**
|
||||||
|
|
||||||
## Timeline
|
## Timeline
|
||||||
|
|
||||||
@@ -25,6 +28,8 @@ HP ProLiant Server (SN: MXQ80400X4) experienced non-volatile memory corruption f
|
|||||||
|
|
||||||
### 0935-[IN PROGRESS] - Recovery Actions
|
### 0935-[IN PROGRESS] - Recovery Actions
|
||||||
|
|
||||||
|
**HP ProLiant Server (SN: MXQ80400X4):**
|
||||||
|
|
||||||
**BIOS/UEFI Reconfiguration:**
|
**BIOS/UEFI Reconfiguration:**
|
||||||
- Factory reset required due to NVRAM corruption
|
- Factory reset required due to NVRAM corruption
|
||||||
- Reconfigured BIOS settings
|
- Reconfigured BIOS settings
|
||||||
@@ -42,29 +47,76 @@ HP ProLiant Server (SN: MXQ80400X4) experienced non-volatile memory corruption f
|
|||||||
- Hypervisor operational after BIOS reconfiguration
|
- Hypervisor operational after BIOS reconfiguration
|
||||||
- No VM data loss reported
|
- No VM data loss reported
|
||||||
|
|
||||||
|
**Dell Server (VWP-QBS) - Separate Boot Issue:**
|
||||||
|
|
||||||
|
**Boot Retry Loop:**
|
||||||
|
- VWP-QBS (Dell physical server, 172.16.9.169) stuck at "Boot Retry" screen
|
||||||
|
- Accessed via DRAC (Dell Remote Access Controller)
|
||||||
|
- Forced manual boot device selection -> Windows Boot Manager
|
||||||
|
- [OK] Server booted successfully
|
||||||
|
- [OK] Server appears to be functioning normally now
|
||||||
|
- Likely related to power outage affecting boot order/configuration
|
||||||
|
- NOTE: VWP-QBS is NOT a VM - it's a separate physical Dell server
|
||||||
|
|
||||||
|
**XenServer (Older Dell) - OFFLINE:**
|
||||||
|
|
||||||
|
**Status:**
|
||||||
|
- [CRITICAL] XenServer offline
|
||||||
|
- Impact: Server3 VM unavailable
|
||||||
|
- Investigating cause (likely power outage related)
|
||||||
|
- Checking hardware status, boot sequence, and hypervisor state
|
||||||
|
- Dell server - older hardware
|
||||||
|
|
||||||
## Next Steps
|
## Next Steps
|
||||||
|
|
||||||
|
**CRITICAL:**
|
||||||
|
- [ ] **Restore XenServer** (currently investigating offline status)
|
||||||
|
- [ ] **Verify Server3 VM status** once XenServer restored
|
||||||
|
|
||||||
|
**High Priority:**
|
||||||
- [ ] Complete onsite work (timer running)
|
- [ ] Complete onsite work (timer running)
|
||||||
- [ ] Reconfigure iLO settings (credentials, network)
|
- [ ] Reconfigure HP iLO settings (credentials, network)
|
||||||
- [ ] Document iLO IP address and credentials
|
- [ ] Document iLO IP address and credentials
|
||||||
- [ ] Verify all server settings match pre-incident configuration
|
- [ ] Verify all server settings match pre-incident configuration
|
||||||
- [ ] Test remote management access
|
|
||||||
- [ ] Update server documentation with serial number
|
**Follow-up:**
|
||||||
- [ ] Create follow-up preventive measures (UPS check, power protection)
|
- [ ] Test remote management access (iLO, DRAC)
|
||||||
|
- [ ] Update server documentation with serial numbers and DRAC IPs
|
||||||
|
- [ ] Create follow-up preventive measures (UPS assessment critical)
|
||||||
|
|
||||||
## Server Information
|
## Server Information
|
||||||
|
|
||||||
**HP ProLiant Server:**
|
**HP ProLiant Server:**
|
||||||
- Serial Number: MXQ80400X4
|
- Serial Number: MXQ80400X4
|
||||||
- Model: [TO BE DOCUMENTED]
|
- Model: [TO BE DOCUMENTED]
|
||||||
- Role: VM Host
|
- Role: VM Host (runs VWP_ADSRVR and other VMs)
|
||||||
- Location: Valleywide onsite
|
- Location: Valleywide onsite
|
||||||
|
- Status: Reconfigured, operational
|
||||||
|
|
||||||
**iLO Management:**
|
**HP iLO Management:**
|
||||||
- Status: Reset to factory defaults
|
- Status: Reset to factory defaults
|
||||||
- IP: [TO BE RECONFIGURED]
|
- IP: [TO BE RECONFIGURED]
|
||||||
- Credentials: [TO BE RESET]
|
- Credentials: [TO BE RESET]
|
||||||
|
|
||||||
|
**Dell Server (VWP-QBS):**
|
||||||
|
- Model: Dell (with DRAC)
|
||||||
|
- Role: QuickBooks Server, RDS Host (Windows Server 2022)
|
||||||
|
- IP: 172.16.9.169
|
||||||
|
- Location: Valleywide onsite
|
||||||
|
- Status: Boot issue resolved, operational
|
||||||
|
- NOTE: Physical server, NOT a VM
|
||||||
|
|
||||||
|
**Dell DRAC Management:**
|
||||||
|
- Status: Functional (used to force manual boot)
|
||||||
|
- IP: [TO BE DOCUMENTED]
|
||||||
|
|
||||||
|
**XenServer (Older Dell):**
|
||||||
|
- Model: Dell (older hardware)
|
||||||
|
- Role: VM Host for Server3
|
||||||
|
- Location: Valleywide onsite
|
||||||
|
- Status: **OFFLINE - INVESTIGATING**
|
||||||
|
- Impact: Server3 VM unavailable
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- Power outage caused NVRAM corruption - rare but critical failure
|
- Power outage caused NVRAM corruption - rare but critical failure
|
||||||
|
|||||||
Reference in New Issue
Block a user