From 8652a6db098a25e59b43d5f9ef64fa0b9e40577c Mon Sep 17 00:00:00 2001 From: Howard Enos Date: Sun, 28 Jun 2026 19:09:29 -0700 Subject: [PATCH] sync: auto-sync from HOWARD-HOME at 2026-06-28 19:08:56 Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-28 19:08:56 --- ...-06-28-howard-imc1-windows-update-block.md | 115 ++++++++++++++++++ errorlog.md | 2 + 2 files changed, 117 insertions(+) create mode 100644 clients/instrumental-music-center/session-logs/2026-06/2026-06-28-howard-imc1-windows-update-block.md diff --git a/clients/instrumental-music-center/session-logs/2026-06/2026-06-28-howard-imc1-windows-update-block.md b/clients/instrumental-music-center/session-logs/2026-06/2026-06-28-howard-imc1-windows-update-block.md new file mode 100644 index 00000000..e492358c --- /dev/null +++ b/clients/instrumental-music-center/session-logs/2026-06/2026-06-28-howard-imc1-windows-update-block.md @@ -0,0 +1,115 @@ +## User +- **User:** Howard Enos (howard) +- **Machine:** Howard-Home +- **Role:** tech + +## Session Summary + +Mike reported that the IMC server (IMC1) was repeatedly hit by a Windows update that takes the +machine down for ~2 hours in a crash/restore cycle, and asked that it be blocked. Context loaded +from the IMC wiki, `PROJECT_STATE.md`, and the 2026-04 ticket notes identified the root cause: a +known component-store corruption on IMC1 (oversized `COMPONENTS` registry hive + a malformed ETW +event manifest for provider GUID `{9c2a37f3-e5fd-5cae-bcd1-43dafeee1ff0}`). Every monthly Server +2016 cumulative update stages successfully, reboots, runs a long apply phase, fails with +`HRESULT 15010 ERROR_EVT_INVALID_EVENT_DATA` -> `CBS_E_INSTALLERS_FAILED`, and rolls back. The +documented offender was KB5075999 (Feb 2026), but because CUs supersede, the currently-offered +update was expected to be newer. + +Confirmed scope with Mike (chose "Both"): pause quality updates AND hide the specific CU. Resolved +IMC1 live via GuruRMM (its agent UUID had re-enrolled to `88cbf7c0-abfa-4f12-846c-96274f718bff`; +the wiki/PROJECT_STATE `fa99e913-...` was stale). Enumerated pending updates via the +`Microsoft.Update.Session` COM API: the current OS CU is **KB5094122** (2026-06 Cumulative Update +for Server 2016); KB5075999 is superseded and no longer offered; nothing was previously hidden. + +Executed the block: set `IsHidden=true` on KB5094122 via the WU COM API and set +`HKLM\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate\AU\NoAutoUpdate=1`, then restarted +`wuauserv`. Both verified (`HIDDEN_NOW=1`, `NoAutoUpdate=1`). Posted the required `[RMM]` dev-alert +and documented the hold in `PROJECT_STATE.md`. + +Mike then asked what it would take to fix the component store (intending to work it later tonight). +Pulled the full 2026-04-13 ticket-notes record of prior DISM/SFC attempts and gathered fresh +read-only diagnostics from IMC1. Key new fact: the `COMPONENTS` hive has grown from 168 MB (April) +to **259.5 MB** now (normal 30-50 MB) — the corruption is progressive. Free RAM is ~21 GB, so the +April DISM `E_OUTOFMEMORY` is the servicing stack failing to load the bloated hive, not physical +memory. The broken ETW publisher registry key is **missing**, so the cheap "re-register the +manifest" fix has no handle. Synthesized three repair paths (in-place DISM repair / in-place 2019 +upgrade / clean 2019 build) with a recommendation. Per Mike's final instruction, left the updates +disabled (already done) and DM'd Mike the issue + options for a path/window decision. + +## Key Decisions + +- Blocked the CU two ways (hide specific KB + `NoAutoUpdate=1`) per Mike's "Both" choice, because + CUs supersede monthly — hiding only one KB by number would let next month's CU re-trigger the + same rollback. `NoAutoUpdate=1` is the durable backstop (WU deferral/pause caps at 30-35 days). +- Chose `NoAutoUpdate=1` (auto-install off, manual still works) over a timed pause, since IMC1 + cannot successfully apply any OS CU until the store is repaired/migrated — it gets zero patch + benefit regardless, so the only effect of leaving auto-update on is the recurring outage. +- Resolved the agent live rather than trusting the documented UUID — the GuruRMM skill rule + (UUIDs change on re-enroll) paid off; the documented `fa99e913-...` was stale. +- Recommended planning for Path B (in-place 2019 upgrade) as the realistic fix: it rebuilds the + servicing stack (usually clears this corruption) and advances the 2027-01-12 Server 2016 EOL + migration in one window. Path A kept as a cheap time-boxed probe; Path C (clean build) as + fallback. All paths gated behind a verified full image + system-state backup (IMC1 is the only + working DC + live AIM SQL/POS). + +## Problems Encountered + +- Documented agent UUID was stale (re-enrolled). Resolved by live lookup via `GET /api/agents`; + updated `PROJECT_STATE.md` with the new UUID. +- The Bash poll loop hit the tool's 2-minute wall while the WU COM search ran an online scan. + Resolved by checking command status directly afterward (the command itself completed exit 0). + +## Configuration Changes + +- IMC1 (remote): hid Windows Update KB5094122 (`IUpdate.IsHidden=true` via COM). +- IMC1 (remote): created/set registry value + `HKLM\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate\AU\NoAutoUpdate = 1` (DWORD); restarted + `wuauserv`. +- `clients/instrumental-music-center/PROJECT_STATE.md` — bumped Last updated to 2026-06-28; added + "UPDATE HOLD" known-issue entry; added 2026-06-28 Recent Changes row; recorded the new agent UUID. + +## Credentials & Secrets + +- None discovered, created, or rotated this session. GuruRMM API creds read via vault + (`infrastructure/gururmm-server.sops.yaml`) through the `/rmm` skill bootstrap. + +## Infrastructure & Servers + +- **IMC1** — 192.168.0.2, Windows Server 2016 Standard, build **14393.8422**, Dell R720, DC for + imc.local + AIMsi SQL host (`IMC1\SQLEXPRESS`, prod) + RDS. +- **GuruRMM agent (IMC1):** `88cbf7c0-abfa-4f12-846c-96274f718bff` (was `fa99e913-1027-4e33-a928-7695e31068e7`). +- **GuruRMM API:** http://172.16.3.30:3001 +- Component store: `COMPONENTS` hive 259.5 MB (was 168 MB in April); free RAM ~21 GB at check time; + no pending.xml; broken ETW publisher GUID `{9c2a37f3-e5fd-5cae-bcd1-43dafeee1ff0}` regkey absent. + +## Commands & Outputs + +- Enumerate offered updates (COM): `IsInstalled=0 and IsHidden=0` -> 10 items; the only OS CU was + `2026-06 Cumulative Update for Windows Server 2016 (KB5094122)`, `IsDownloaded=False`. Also a .NET + 4.8 CU KB5087065 (separate, lighter package, not the rollback cause). +- Hide + policy result: `HID: ...(KB5094122)`, `POLICY_SET NoAutoUpdate=1`, `WUAUSERV_RESTARTED`, + verify `HIDDEN_NOW=1`, `NoAutoUpdate_NOW=1`. +- Diagnostics: `COMPONENTS_HIVE_MB=259.5`, `OS_BUILD=8422`, `FREE_RAM_MB=21,855`, + `PUBLISHER_REGKEY_MISSING`, `PENDING_XML_EXISTS=False`. +- Dev-alert posted (`message_id=1520957745820602529`); Mike DM'd (`message_id=1520959250304598016`). + +## Pending / Incomplete Tasks + +- **Awaiting Mike's decision** on the component-store fix path (A/B/C) + a maintenance window, and + confirmation of a Server 2019 ISO + license. +- Before any repair window: take + verify a full image and system-state backup of IMC1 (sole DC + + live POS); confirm last-night AIM `.bak` + Cloudberry off-site are current; check F: free space; + stage a build-matched (14393.8422) DISM repair source. +- Reverse the update hold (unhide KB5094122 + remove `NoAutoUpdate`) once the store is repaired or + IMC1 is migrated to Server 2019. +- Stale UUID also appears in the IMC wiki article (`fa99e913-...`) and GuruRMM Enrollment table — + refresh on next `/wiki-compile`. + +## Reference Information + +- Prior repair attempts: `clients/instrumental-music-center/docs/2026-04-13-ticket-notes.md` (DISM + RestoreHealth E_OUTOFMEMORY, RTM `/Source` CBS_E_SOURCE_MISSING, KB5075999 apply-on-boot rollback). +- Three migration paths table: same ticket-notes doc, "Open items / recommendations". +- IMC client wiki: `wiki/clients/instrumental-music-center.md`. +- Failure signature: `HRESULT_FROM_WIN32(15010) ERROR_EVT_INVALID_EVENT_DATA` at + `onecore\admin\wmi\events\config\manproc.cpp:733` -> `CBS_E_INSTALLERS_FAILED` -> rollback. diff --git a/errorlog.md b/errorlog.md index 1cb17b41..cdb7a0df 100644 --- a/errorlog.md +++ b/errorlog.md @@ -17,6 +17,8 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure · +2026-06-29 | Howard-Home | save/rmm-scratch | [friction] wrote RMM command-id scratch files (.netprobe_id, .stage_id, etc.) to repo root C:/claudetools; .netprobe_id got swept into a sync commit by git add -A and needed git rm. Use the session scratchpad dir for transient IDs, not the repo root. [ctx: ref=feedback_tmp_path_windows] + 2026-06-28 | Howard-Home | rmm/spec-015-safeboot | [friction] safe-mode survival test stranded DESKTOP-MS42HNC: (a) registering only GuruRMMAgent/Watchdog in SafeBootNetwork is insufficient for the agent to connect in Safe Mode (needs network-stack deps e.g. BFE/Dnscache/CryptSvc); (b) Task-Scheduler dead-man does NOT fire in Safe Mode so auto-revert failed -> required manual console recovery [ctx: host=DESKTOP-MS42HNC spec=SPEC-015 fix=use-a-service-not-schtasks-for-revert test-only-on-disposable-VM] 2026-06-28 | Howard-Home | rmm/powershell-discovery | [friction] broad '*.log' Get-ChildItem on C:WindowsTemp pulled a 157KB Office C2R telemetry log into command output, wasting tokens; scope log searches to the specific filename (mccleanup.log) or a tight -Filter, not *.log