sync: auto-sync from Mikes-MacBook-Air.local at 2026-05-25 05:50:34
Author: Mike Swanson Machine: Mikes-MacBook-Air.local Timestamp: 2026-05-25 05:50:34
This commit is contained in:
147
session-logs/2026-05-25-session.md
Normal file
147
session-logs/2026-05-25-session.md
Normal file
@@ -0,0 +1,147 @@
|
||||
# Session Log — 2026-05-25
|
||||
|
||||
## User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** Mikes-MacBook-Air.local
|
||||
- **Role:** admin
|
||||
- **Session:** 05:00 - 05:48 MST
|
||||
|
||||
## Session Summary
|
||||
|
||||
Recovered GURU-KALI workstation from black screen caused by nvidia driver installation using GuruRMM remote command execution. The system had booted to black screen after installing nvidia driver version 595.71.05-1, but the GuruRMM agent remained online and responsive, enabling remote diagnosis and repair.
|
||||
|
||||
Connected to the GuruRMM API at 172.16.3.30:3001 and confirmed GURU-KALI agent (ID a73ba38e-cd02-4331-b8bf-474cd899ec22) was online despite the display failure. Sent remote shell command to enumerate installed nvidia packages, discovering 50+ packages including driver, libraries, and firmware. Initial removal attempt failed with "Read-only file system" errors across /var/lib/dpkg and /var/cache/apt, indicating the filesystem had been mounted read-only - likely a protective measure after a previous boot failure.
|
||||
|
||||
Remounted the root filesystem as read-write using "mount -o remount,rw /", then executed a full nvidia package removal using apt-get with DEBIAN_FRONTEND=noninteractive to avoid interactive prompts. This removed all nvidia-* and libnvidia-* packages, but firmware packages and some DKMS modules remained. Performed a second pass removing firmware-nvidia-graphics and firmware-nvidia-gsp, then created /etc/modprobe.d/blacklist-nvidia.conf to prevent the nvidia kernel modules from loading on future boots. Updated initramfs to apply the blacklist.
|
||||
|
||||
Rebooted the system twice - first after the initial driver removal, then again after the blacklist was applied. After the second reboot, verified that lightdm display manager started successfully (active and running state). User confirmed the display was restored and showing the login screen. The system is now using either the Intel i915 integrated graphics driver or framebuffer fallback instead of the problematic nvidia driver. Blacklist remains in place to prevent recurrence.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- **Used GuruRMM remote commands rather than physical access** — Agent was online despite black screen, enabling fully remote recovery without needing console access or recovery media
|
||||
- **Remounted filesystem before package operations** — Read-only state blocked all dpkg/apt operations; remounting as read-write was mandatory before proceeding with driver removal
|
||||
- **Performed multi-pass removal** — First removed main driver packages, then firmware, then created blacklist and updated initramfs as separate operations to ensure each step completed cleanly
|
||||
- **Created permanent blacklist** — Added /etc/modprobe.d/blacklist-nvidia.conf rather than just removing packages, preventing automatic reloading if packages get reinstalled via dependencies
|
||||
- **Rebooted twice** — First reboot applied the package removal; second reboot after blacklist creation ensured nvidia modules wouldn't load from initramfs
|
||||
- **Used DEBIAN_FRONTEND=noninteractive** — Prevented apt-get from blocking on interactive prompts during unattended remote execution
|
||||
|
||||
## Problems Encountered
|
||||
|
||||
- **Filesystem mounted read-only** — Initial package removal failed with "unable to access dpkg database" and "Read-only file system" errors. Resolved by running "mount -o remount,rw /" before retrying removal operations.
|
||||
- **JSON parsing control characters** — Command output containing terminal control codes caused jq parsing failures. Worked around by using grep/python for status checks or by stripping control characters.
|
||||
- **Firmware packages remained after initial removal** — First apt-get pass removed driver packages but left firmware-nvidia-graphics and firmware-nvidia-gsp. Required explicit second removal targeting firmware-* packages.
|
||||
- **Blacklist file initially missing** — After first reboot, /etc/modprobe.d/blacklist-nvidia.conf was not present despite creation command showing success. Recreated using heredoc syntax and verified file contents before final reboot.
|
||||
- **Exit code 100 despite success** — Several apt-get operations returned exit code 100 (indicating warnings/non-critical issues) but included success markers in stdout. Used marker strings like "NVIDIA REMOVAL COMPLETE" to verify actual completion rather than relying solely on exit codes.
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
**GURU-KALI (100.75.148.91 / Tailscale) — remote via GuruRMM:**
|
||||
- Removed 50+ nvidia packages (nvidia-driver, nvidia-open, xserver-xorg-video-nvidia, all libnvidia-* libs)
|
||||
- Removed firmware-nvidia-graphics and firmware-nvidia-gsp
|
||||
- Created `/etc/modprobe.d/blacklist-nvidia.conf`:
|
||||
```
|
||||
blacklist nvidia
|
||||
blacklist nvidia_drm
|
||||
blacklist nvidia_modeset
|
||||
blacklist nvidia_uvm
|
||||
```
|
||||
- Updated initramfs (all kernels) to apply blacklist
|
||||
- Remounted root filesystem as read-write (was read-only)
|
||||
- Rebooted system twice
|
||||
|
||||
**ClaudeTools:**
|
||||
- `.claude/current-mode` set to `infra` (work mode for infrastructure operations)
|
||||
|
||||
## Credentials & Secrets
|
||||
|
||||
No new credentials created. Used existing vaulted credentials:
|
||||
- GuruRMM API admin credentials: `infrastructure/gururmm-server.sops.yaml` -> `credentials.gururmm-api.admin-email` (claude-api@azcomputerguru.com) and `credentials.gururmm-api.admin-password`
|
||||
- Token stored temporarily in `/tmp/rmm_token` during session, deleted after completion
|
||||
|
||||
## Infrastructure & Servers
|
||||
|
||||
**GURU-KALI:**
|
||||
- Hostname: GURU-KALI
|
||||
- Tailscale IP: 100.75.148.91
|
||||
- GuruRMM Agent ID: a73ba38e-cd02-4331-b8bf-474cd899ec22
|
||||
- OS: Kali Linux (dpkg-based)
|
||||
- Display Manager: lightdm (now active and running)
|
||||
- Graphics: Intel i915 integrated (after nvidia removal) or framebuffer fallback
|
||||
- Status: Online, display restored
|
||||
|
||||
**GuruRMM Server (Saturn):**
|
||||
- IP: 172.16.3.30
|
||||
- API Base: http://172.16.3.30:3001/api
|
||||
- Authentication: JWT Bearer token (obtained via POST /auth/login)
|
||||
- Command execution: POST /api/agents/{id}/command
|
||||
- Command polling: GET /api/commands/{id}
|
||||
|
||||
## Commands & Outputs
|
||||
|
||||
```bash
|
||||
# Authenticate with GuruRMM API
|
||||
curl -s -X POST "http://172.16.3.30:3001/api/auth/login" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"email":"claude-api@azcomputerguru.com","password":"***"}' | jq -r '.token'
|
||||
# -> (JWT token)
|
||||
|
||||
# Check agent status
|
||||
curl -s "http://172.16.3.30:3001/api/agents/a73ba38e-cd02-4331-b8bf-474cd899ec22" \
|
||||
-H "Authorization: Bearer $TOKEN" | jq '{hostname, status}'
|
||||
# -> {"hostname": "GURU-KALI", "status": "online"}
|
||||
|
||||
# List installed nvidia packages (command_id: 9302b83c-2f7b-4588-beb0-d735d3977b07)
|
||||
# Command: dpkg -l | grep -i nvidia
|
||||
# Output: 50 packages including nvidia-driver 595.71.05-1, nvidia-open, libnvidia-*, firmware-nvidia-*
|
||||
|
||||
# Remount filesystem as read-write (command_id: 2d1f683d-565a-4cfb-a17d-198770fac799)
|
||||
# Command: mount -o remount,rw / && echo "Filesystem remounted as read-write" && mount | grep " / "
|
||||
# Exit code: 0 (success)
|
||||
|
||||
# Remove nvidia drivers (command_id: 64cc2ca5-e031-4795-9aa4-27fde8b37c90)
|
||||
# Command: DEBIAN_FRONTEND=noninteractive apt-get remove --purge -y nvidia-* libnvidia-* && apt-get autoremove -y
|
||||
# Exit code: 100 (warnings but removed 48 packages, freed 979 MB)
|
||||
|
||||
# Verify removal (command_id: 8d415bfe-23e2-49a2-8da5-f98f5fd71a8c)
|
||||
# Command: dpkg -l | grep -i nvidia || echo "No nvidia packages found"
|
||||
# Output: Only firmware packages remained (firmware-nvidia-graphics, firmware-nvidia-gsp)
|
||||
|
||||
# Complete removal with blacklist (command_id: 190efe95-a11a-4960-869d-8be778e129bf)
|
||||
# Command: apt-get remove --purge -y firmware-nvidia-* && dpkg --purge nvidia-driver nvidia-kernel-support ...
|
||||
# && dkms status | grep nvidia | cut -d, -f1,2 | xargs -r -n1 sh -c 'dkms remove $0'
|
||||
# && echo -e "blacklist nvidia\nblacklist nvidia_drm\nblacklist nvidia_modeset\nblacklist nvidia_uvm" > /etc/modprobe.d/blacklist-nvidia.conf
|
||||
# && update-initramfs -u
|
||||
# Output marker: "COMPLETE NVIDIA REMOVAL DONE"
|
||||
|
||||
# Reboot (command_id: 8628dce8-8755-4a49-9904-c684455de70f)
|
||||
# Command: sync && echo "Final reboot in 5 seconds..." && sleep 5 && reboot
|
||||
|
||||
# Final verification after reboot (command_id: f6737830-4ca9-4ed3-b616-d3305a445f10)
|
||||
# Status: lightdm.service active (running)
|
||||
# Display: Confirmed working by user
|
||||
```
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
|
||||
None. Recovery complete.
|
||||
|
||||
**Future consideration:** If nvidia GPU needed again:
|
||||
1. Remove blacklist: `sudo rm /etc/modprobe.d/blacklist-nvidia.conf`
|
||||
2. Reinstall nvidia drivers with proper Xorg configuration
|
||||
3. Update initramfs: `sudo update-initramfs -u`
|
||||
4. Reboot
|
||||
|
||||
## Reference Information
|
||||
|
||||
- **GuruRMM API docs:** Command execution via POST /api/agents/{id}/command with payload `{command_type: "shell", command: "...", timeout_seconds: 300}`
|
||||
- **GURU-KALI session log reference:** session-logs/2026-05-24-GURU-KALI-session.md (previous work on this machine)
|
||||
- **Wiki reference:** wiki/clients/internal-infrastructure.md (ACG infrastructure inventory)
|
||||
- **Vault paths:**
|
||||
- GuruRMM API credentials: `infrastructure/gururmm-server.sops.yaml`
|
||||
- **Command IDs from this session:**
|
||||
- Initial nvidia list: 9302b83c-2f7b-4588-beb0-d735d3977b07
|
||||
- Filesystem remount: 2d1f683d-565a-4cfb-a17d-198770fac799
|
||||
- Driver removal: 64cc2ca5-e031-4795-9aa4-27fde8b37c90
|
||||
- Complete removal: 190efe95-a11a-4960-869d-8be778e129bf
|
||||
- Final reboot: 8628dce8-8755-4a49-9904-c684455de70f
|
||||
- Blacklist creation: f6737830-4ca9-4ed3-b616-d3305a445f10
|
||||
Reference in New Issue
Block a user