sync: auto-sync from HOWARD-HOME at 2026-06-25 11:42:29

Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-06-25 11:42:29
This commit is contained in:
2026-06-25 11:42:58 -07:00
parent fc36f98450
commit 4a63b583b7
5 changed files with 245 additions and 1 deletions

View File

@@ -0,0 +1,94 @@
## User
- **User:** Howard Enos (howard)
- **Machine:** Howard-Home
- **Role:** tech
## Session Summary
Investigated a reported BSOD on the Dataforth shipping-station PC DFORTH-Ship: stop code
`0x00000116 VIDEO_TDR_FAILURE`. Resolved the agent via `/rmm-search` (exact match DFORTH-Ship,
id `db17e069-2948-4cbc-97ea-1da721edcaf5`, Dataforth Corp / site D1, online), distinguishing it
from a near-twin host `DForth-Shipp`.
Ran two read-only PowerShell diagnostics over GuruRMM. The first pulled GPU/driver inventory,
recent bugcheck/Kernel-Power events, display/TDR driver events, WHEA, and the minidump list. The
GPU is an integrated Intel HD Graphics 4600 on driver `20.19.15.5126` (1/20/2020 — Intel's final
driver for that part). The latest crash (6/24/2026 04:36) was confirmed `0x116` with arg3
`0xc0000001` (GPU reset did not complete in the 2s TDR window). Five minidumps exist spanning
11/3/2025 -> 5/3 -> 5/20 -> 6/16 -> 6/24/2026, an accelerating cadence.
The second diagnostic confirmed the System event log had rolled (only the latest 1001 bugcheck
survives in events, though dump files persist), that TdrDelay/TdrLevel are at defaults, that Edge
+ WebView2 (hardware-accelerated) are installed, and that the hardware is an HP EliteDesk 800 G1
USDT with a Dec-2014 BIOS (~11.5-year-old ultra-slim chassis, heat/dust prone).
Diagnosis: display-driver TDR on aging integrated graphics; because it is integrated there is no
card to reseat/swap. Recommended PC replacement as the real fix with interim mitigations. Per
Howard's go-ahead, applied mitigation #1: disabled Edge hardware acceleration via machine policy
(`HKLM\SOFTWARE\Policies\Microsoft\Edge\HardwareAccelerationModeEnabled = 0`), verified value = 0,
exit 0. Posted the required `[RMM]` write alert to #dev-alerts.
## Key Decisions
- Targeted the exact host DFORTH-Ship over the near-twin DForth-Shipp to avoid acting on the wrong
Dataforth machine.
- Classified the crash as a TDR on integrated graphics, so ruled out "reseat/replace the GPU"
advice — the GPU is on the CPU/motherboard.
- Chose disabling Edge hardware acceleration as the first mitigation: it is the most common
software TDR trigger on HD 4600, low-risk, reversible, and offers no downside on a shipping PC.
- Held off on the TdrDelay registry band-aid; it masks marginal timeouts and would not save a
genuine hardware fault. Flagged thermal cleaning + PC replacement as the durable path given the
accelerating dump cadence on an 11.5-year-old slim desktop.
## Problems Encountered
- Full bugcheck-code history was unavailable from the event log (System log had rolled; only the
6/24 1001 event remained). Worked around by enumerating the persisted `.dmp` files to establish
the crash cadence; older signatures left unconfirmed (would require loading the dumps).
## Configuration Changes
- DFORTH-Ship registry (via RMM): created/set `HKLM\SOFTWARE\Policies\Microsoft\Edge` value
`HardwareAccelerationModeEnabled` (DWORD) = `0`. Reversible (delete value or set to 1). Effective
on next Edge restart.
- No files modified in the repo.
## Credentials & Secrets
None discovered or created this session. RMM auth via existing vault path
`infrastructure/gururmm-server.sops.yaml`.
## Infrastructure & Servers
- Host: DFORTH-Ship — GuruRMM agent id `db17e069-2948-4cbc-97ea-1da721edcaf5`, Dataforth Corp,
site D1, Windows, online.
- Hardware: HP EliteDesk 800 G1 USDT, BIOS release 12/10/2014. GPU: Intel HD Graphics 4600,
driver 20.19.15.5126 (2020-01-20). Logged-on console user: `shipping`.
- Near-twin host (not touched): DForth-Shipp, id `95991b45-d843-4586-8275-9996d0d9ae17`.
- GuruRMM API: http://172.16.3.30:3001
## Commands & Outputs
- Latest bugcheck: `0x00000116 (0xffff850c0cc03010, 0xfffff80646d91b10, 0xffffffffc0000001,
0x0000000000000003)` at 6/24/2026 04:36, dump `C:\WINDOWS\Minidump\062426-8953-01.dmp`.
- Minidumps present: 110325-8265-01, 050326-7921-01, 052026-7937-01, 061626-7687-01, 062426-8953-01.
- Mitigation verify output: `Set HardwareAccelerationModeEnabled = 0 (0 = disabled)`, exit 0
(cmd `b98d56ba-065b-431b-b976-783d5902d80d`).
- Diagnostic cmd ids: `b666b53b-...` (GPU/events/dumps), `f562d01f-...` (history/TDR/model).
## Pending / Incomplete Tasks
- Have on-site staff fully restart Edge (or reboot) so the HW-accel policy takes effect; verify at
`edge://policy` and `edge://settings/system`.
- Monitor for recurrence. If it bugchecks again, pull and analyze the four older dump signatures to
confirm whether it is drifting toward a hard hardware fault.
- Schedule thermal cleaning of the USDT chassis/fan (on-site).
- Recommend/plan replacement of the 11.5-year-old EliteDesk 800 G1 USDT shipping station.
## Reference Information
- Stop code: 0x00000116 VIDEO_TDR_FAILURE (Timeout Detection & Recovery; default TdrDelay 2s).
- TDR registry: `HKLM\SYSTEM\CurrentControlSet\Control\GraphicsDrivers` (TdrDelay/TdrLevel — at
defaults on this host).
- Edge policy: `HKLM\SOFTWARE\Policies\Microsoft\Edge\HardwareAccelerationModeEnabled`.
- #dev-alerts message id: 1519768574304980993.