# 2026-04-22 — Valleywide Power Outage Emergency Response ## User - **User:** Mike Swanson (mike) - **Machine:** Mikes-MacBook-Air.local - **Role:** admin ## Session Summary **Emergency onsite response to power outage at Valleywide.** Multiple server failures requiring immediate attention. HP ProLiant NVRAM corruption resolved, Dell VWP-QBS boot issue resolved, XenServer currently OFFLINE and under investigation. **Work Status:** ONGOING - Timer running - Switching to laptop to continue ## Timeline ### Pre-Session Context - Intune-manager tier added to remediation-tool - Vault sync completed (SSH auth fixed, intune SOPS file pulled) - Message sent to Howard about new Intune capabilities ### 1009 - Emergency Ticket Received **Valleywide onsite emergency:** - Arrival onsite: 0935 MST - Cause: Power outage affecting all servers - Client: Valleywide (VWP) - Priority: Critical ### 1009-1020 - Initial Documentation **Created emergency session log:** - `clients/valleywide/session-logs/2026-04-22-hp-server-nvram-corruption-emergency.md` - Documented HP ProLiant NVRAM corruption issue **HP ProLiant Server (SN: MXQ80400X4):** - Non-volatile memory corruption from power outage - BIOS/UEFI reset to factory defaults - Reconfigured BIOS settings - iLO reset to factory - needs reconfiguration - All VMs running successfully - Status: Operational but iLO pending ### 1020-1030 - Dell VWP-QBS Boot Issue **User reported additional server issue:** - VWP-QBS stuck at "Boot Retry" screen - Clarified: Dell server (separate from HP) - Physical server, NOT a VM - IP: 172.16.9.169 - Runs QuickBooks + RDS **Resolution:** - Accessed via DRAC (Dell Remote Access Controller) - Forced manual boot device selection - Selected Windows Boot Manager - Server booted successfully - Status: Operational **Updated documentation:** - Clarified two separate physical servers - Added Dell DRAC information - Updated README.md with physical server architecture ### 1100 - XenServer Offline Discovery **User checking remaining servers:** - Total of 4 servers at site - Server 1: HP ProLiant - OK - Server 2: Dell VWP-QBS - OK - Server 3: XenServer (older Dell) - **OFFLINE** - Server 4: Appears fine **XenServer Issue:** - Status: OFFLINE - Role: VM Host for Server3 VM - Impact: Server3 VM unavailable - Currently investigating **Updated priorities:** - XenServer restoration now CRITICAL - All documentation updated - Changes committed and pushed to Gitea ### 1115 - Session Handoff **User switching to laptop:** - All work documented and synced to Gitea - Timer still running - Ready to continue from laptop ## Current Server Status ### 1. HP ProLiant Server (SN: MXQ80400X4) - VM Host **Status:** Operational, iLO reconfiguration pending **Issues Resolved:** - NVRAM corruption from power outage - BIOS/UEFI factory reset completed - BIOS settings reconfigured - Boot order restored - Virtualization settings re-enabled - All VMs confirmed running **Outstanding:** - iLO reset to factory defaults - iLO credentials need to be re-entered - iLO network configuration needs restoration - Remote management temporarily unavailable **VMs on this host:** - VWP_ADSRVR (192.168.0.25) - Domain Controller - Other VMs [TO BE DOCUMENTED] ### 2. Dell Server (VWP-QBS) - Physical **IP:** 172.16.9.169 **Status:** Operational **Issues Resolved:** - Boot retry loop after power outage - DRAC remote access functional - Manual boot to Windows Boot Manager successful - Server now running normally **Services:** - Windows Server 2022 Standard - QuickBooks Server - RDS Host (Remote Desktop Services) - IIS with RD Gateway / RD Web Access **Outstanding:** - DRAC IP needs documentation - Verify boot order settings persist ### 3. XenServer (Older Dell) - VM Host **Status:** **OFFLINE - INVESTIGATING** **Priority:** CRITICAL **Impact:** - Server3 VM unavailable - Unknown additional services/VMs affected **Investigation Status:** - Offline discovered during post-outage check - Cause unknown (likely power outage related) - Hardware status unknown - Boot sequence unknown - Hypervisor state unknown **Next Actions:** - Determine if server powers on - Check for hardware failures - Verify XenServer hypervisor boots - Assess Server3 VM integrity - Document what Server3 does ### 4. Fourth Server **Status:** Appears fine **Details:** [TO BE DOCUMENTED] ## Next Steps (Priority Order) ### CRITICAL (Must Complete Today) 1. **Restore XenServer** - Determine offline cause - Attempt power on / boot - Check for hardware failures - Restore hypervisor if needed - Verify Server3 VM status - Document what Server3 provides 2. **Verify Server3 VM** - Check VM integrity - Confirm services running - Document service dependencies - Notify users if downtime expected ### HIGH PRIORITY 3. **HP ProLiant iLO Reconfiguration** - Access iLO interface - Set credentials (store in vault) - Configure network settings - Document iLO IP address - Test remote management 4. **Verify All Server Configurations** - Confirm all BIOS settings correct - Verify boot orders persist - Check all VMs running - Test all critical services 5. **Document Fourth Server** - Identify server model - Document role and services - Check configuration post-outage - Add to README.md ### FOLLOW-UP 6. **Remote Management Documentation** - Document all DRAC IPs and credentials - Document iLO IP and credentials - Store in SOPS vault - Update credentials.md 7. **UPS Assessment (CRITICAL FOR PREVENTION)** - Check if UPS exists - Verify UPS capacity - Test UPS functionality - Assess if additional UPS needed - Recommend UPS upgrade if insufficient 8. **Create Incident Report** - Power outage timeline - All affected systems - Resolution steps - Preventive recommendations - Estimated downtime ## Infrastructure Notes ### Networks - Internal: 172.16.9.0/24 - Also: 192.168.0.0/24 - VPN access via OpenVPN on UDM ### Access Methods - SSH to VWP_ADSRVR: `ssh vwp\guru@192.168.0.25` - DRAC for Dell servers (IPs to be documented) - iLO for HP ProLiant (IP to be reconfigured) ### Known Services - VWP_ADSRVR: Domain Controller (vwp.local) - VWP-QBS: QuickBooks + RDS - Server3 VM: [TO BE DOCUMENTED] ## Recommendations ### Immediate (This Visit) 1. Restore XenServer and Server3 VM 2. Complete iLO reconfiguration 3. Document all remote management IPs/credentials 4. Verify all critical services operational ### Short-term (This Week) 1. UPS assessment and recommendations 2. Full incident report 3. Test disaster recovery procedures 4. Update all documentation in vault ### Long-term (Next Month) 1. UPS upgrade if needed 2. Consider generator backup 3. Implement monitoring for power events 4. Document full DR runbook 5. Schedule preventive maintenance ## Files Updated **Session Logs:** - `clients/valleywide/session-logs/2026-04-22-hp-server-nvram-corruption-emergency.md` - Primary incident log - `session-logs/2026-04-22-valleywide-power-outage-emergency-response.md` - This comprehensive log **Documentation:** - `clients/valleywide/README.md` - Updated server inventory and status **Git Commits:** - Initial documentation: HP server NVRAM issue - Update: Dell VWP-QBS boot issue resolved - Update: XenServer offline - critical investigation - Final: Comprehensive session save for laptop handoff ## Work Status **Timer:** RUNNING (started at arrival 0935 MST) **Current Time:** ~1115 MST **Duration:** ~1 hour 40 minutes **Completion Status:** - HP ProLiant: 90% complete (iLO pending) - Dell VWP-QBS: 100% complete - XenServer: 0% complete (currently investigating) - Documentation: Current and synced **Next Session:** - Continue on laptop - Focus on XenServer restoration - Complete iLO configuration - Final documentation and closeout ## Notes for Laptop Continuation **What's done:** - HP ProLiant BIOS reconfigured, VMs running - Dell VWP-QBS boot issue resolved - All work documented and pushed to Gitea **What's critical:** - XenServer is OFFLINE - Server3 VM unavailable - This is the highest priority issue - Impact unknown until Server3 function documented **What's pending:** - HP iLO reconfiguration (not critical but needed) - Fourth server documentation - UPS assessment - Final incident report **How to continue:** - Pull latest from Gitea on laptop - Review `clients/valleywide/session-logs/2026-04-22-hp-server-nvram-corruption-emergency.md` - Focus on XenServer investigation - Update session log with findings - Close out when all servers operational --- **Session saved at:** 2026-04-22 11:15 MST **Resuming on:** Laptop **Priority:** XenServer restoration (CRITICAL)