sync: auto-sync from HOWARD-HOME at 2026-06-28 19:08:56
Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-28 19:08:56
This commit is contained in:
@@ -0,0 +1,115 @@
|
||||
## User
|
||||
- **User:** Howard Enos (howard)
|
||||
- **Machine:** Howard-Home
|
||||
- **Role:** tech
|
||||
|
||||
## Session Summary
|
||||
|
||||
Mike reported that the IMC server (IMC1) was repeatedly hit by a Windows update that takes the
|
||||
machine down for ~2 hours in a crash/restore cycle, and asked that it be blocked. Context loaded
|
||||
from the IMC wiki, `PROJECT_STATE.md`, and the 2026-04 ticket notes identified the root cause: a
|
||||
known component-store corruption on IMC1 (oversized `COMPONENTS` registry hive + a malformed ETW
|
||||
event manifest for provider GUID `{9c2a37f3-e5fd-5cae-bcd1-43dafeee1ff0}`). Every monthly Server
|
||||
2016 cumulative update stages successfully, reboots, runs a long apply phase, fails with
|
||||
`HRESULT 15010 ERROR_EVT_INVALID_EVENT_DATA` -> `CBS_E_INSTALLERS_FAILED`, and rolls back. The
|
||||
documented offender was KB5075999 (Feb 2026), but because CUs supersede, the currently-offered
|
||||
update was expected to be newer.
|
||||
|
||||
Confirmed scope with Mike (chose "Both"): pause quality updates AND hide the specific CU. Resolved
|
||||
IMC1 live via GuruRMM (its agent UUID had re-enrolled to `88cbf7c0-abfa-4f12-846c-96274f718bff`;
|
||||
the wiki/PROJECT_STATE `fa99e913-...` was stale). Enumerated pending updates via the
|
||||
`Microsoft.Update.Session` COM API: the current OS CU is **KB5094122** (2026-06 Cumulative Update
|
||||
for Server 2016); KB5075999 is superseded and no longer offered; nothing was previously hidden.
|
||||
|
||||
Executed the block: set `IsHidden=true` on KB5094122 via the WU COM API and set
|
||||
`HKLM\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate\AU\NoAutoUpdate=1`, then restarted
|
||||
`wuauserv`. Both verified (`HIDDEN_NOW=1`, `NoAutoUpdate=1`). Posted the required `[RMM]` dev-alert
|
||||
and documented the hold in `PROJECT_STATE.md`.
|
||||
|
||||
Mike then asked what it would take to fix the component store (intending to work it later tonight).
|
||||
Pulled the full 2026-04-13 ticket-notes record of prior DISM/SFC attempts and gathered fresh
|
||||
read-only diagnostics from IMC1. Key new fact: the `COMPONENTS` hive has grown from 168 MB (April)
|
||||
to **259.5 MB** now (normal 30-50 MB) — the corruption is progressive. Free RAM is ~21 GB, so the
|
||||
April DISM `E_OUTOFMEMORY` is the servicing stack failing to load the bloated hive, not physical
|
||||
memory. The broken ETW publisher registry key is **missing**, so the cheap "re-register the
|
||||
manifest" fix has no handle. Synthesized three repair paths (in-place DISM repair / in-place 2019
|
||||
upgrade / clean 2019 build) with a recommendation. Per Mike's final instruction, left the updates
|
||||
disabled (already done) and DM'd Mike the issue + options for a path/window decision.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- Blocked the CU two ways (hide specific KB + `NoAutoUpdate=1`) per Mike's "Both" choice, because
|
||||
CUs supersede monthly — hiding only one KB by number would let next month's CU re-trigger the
|
||||
same rollback. `NoAutoUpdate=1` is the durable backstop (WU deferral/pause caps at 30-35 days).
|
||||
- Chose `NoAutoUpdate=1` (auto-install off, manual still works) over a timed pause, since IMC1
|
||||
cannot successfully apply any OS CU until the store is repaired/migrated — it gets zero patch
|
||||
benefit regardless, so the only effect of leaving auto-update on is the recurring outage.
|
||||
- Resolved the agent live rather than trusting the documented UUID — the GuruRMM skill rule
|
||||
(UUIDs change on re-enroll) paid off; the documented `fa99e913-...` was stale.
|
||||
- Recommended planning for Path B (in-place 2019 upgrade) as the realistic fix: it rebuilds the
|
||||
servicing stack (usually clears this corruption) and advances the 2027-01-12 Server 2016 EOL
|
||||
migration in one window. Path A kept as a cheap time-boxed probe; Path C (clean build) as
|
||||
fallback. All paths gated behind a verified full image + system-state backup (IMC1 is the only
|
||||
working DC + live AIM SQL/POS).
|
||||
|
||||
## Problems Encountered
|
||||
|
||||
- Documented agent UUID was stale (re-enrolled). Resolved by live lookup via `GET /api/agents`;
|
||||
updated `PROJECT_STATE.md` with the new UUID.
|
||||
- The Bash poll loop hit the tool's 2-minute wall while the WU COM search ran an online scan.
|
||||
Resolved by checking command status directly afterward (the command itself completed exit 0).
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
- IMC1 (remote): hid Windows Update KB5094122 (`IUpdate.IsHidden=true` via COM).
|
||||
- IMC1 (remote): created/set registry value
|
||||
`HKLM\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate\AU\NoAutoUpdate = 1` (DWORD); restarted
|
||||
`wuauserv`.
|
||||
- `clients/instrumental-music-center/PROJECT_STATE.md` — bumped Last updated to 2026-06-28; added
|
||||
"UPDATE HOLD" known-issue entry; added 2026-06-28 Recent Changes row; recorded the new agent UUID.
|
||||
|
||||
## Credentials & Secrets
|
||||
|
||||
- None discovered, created, or rotated this session. GuruRMM API creds read via vault
|
||||
(`infrastructure/gururmm-server.sops.yaml`) through the `/rmm` skill bootstrap.
|
||||
|
||||
## Infrastructure & Servers
|
||||
|
||||
- **IMC1** — 192.168.0.2, Windows Server 2016 Standard, build **14393.8422**, Dell R720, DC for
|
||||
imc.local + AIMsi SQL host (`IMC1\SQLEXPRESS`, prod) + RDS.
|
||||
- **GuruRMM agent (IMC1):** `88cbf7c0-abfa-4f12-846c-96274f718bff` (was `fa99e913-1027-4e33-a928-7695e31068e7`).
|
||||
- **GuruRMM API:** http://172.16.3.30:3001
|
||||
- Component store: `COMPONENTS` hive 259.5 MB (was 168 MB in April); free RAM ~21 GB at check time;
|
||||
no pending.xml; broken ETW publisher GUID `{9c2a37f3-e5fd-5cae-bcd1-43dafeee1ff0}` regkey absent.
|
||||
|
||||
## Commands & Outputs
|
||||
|
||||
- Enumerate offered updates (COM): `IsInstalled=0 and IsHidden=0` -> 10 items; the only OS CU was
|
||||
`2026-06 Cumulative Update for Windows Server 2016 (KB5094122)`, `IsDownloaded=False`. Also a .NET
|
||||
4.8 CU KB5087065 (separate, lighter package, not the rollback cause).
|
||||
- Hide + policy result: `HID: ...(KB5094122)`, `POLICY_SET NoAutoUpdate=1`, `WUAUSERV_RESTARTED`,
|
||||
verify `HIDDEN_NOW=1`, `NoAutoUpdate_NOW=1`.
|
||||
- Diagnostics: `COMPONENTS_HIVE_MB=259.5`, `OS_BUILD=8422`, `FREE_RAM_MB=21,855`,
|
||||
`PUBLISHER_REGKEY_MISSING`, `PENDING_XML_EXISTS=False`.
|
||||
- Dev-alert posted (`message_id=1520957745820602529`); Mike DM'd (`message_id=1520959250304598016`).
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
|
||||
- **Awaiting Mike's decision** on the component-store fix path (A/B/C) + a maintenance window, and
|
||||
confirmation of a Server 2019 ISO + license.
|
||||
- Before any repair window: take + verify a full image and system-state backup of IMC1 (sole DC +
|
||||
live POS); confirm last-night AIM `.bak` + Cloudberry off-site are current; check F: free space;
|
||||
stage a build-matched (14393.8422) DISM repair source.
|
||||
- Reverse the update hold (unhide KB5094122 + remove `NoAutoUpdate`) once the store is repaired or
|
||||
IMC1 is migrated to Server 2019.
|
||||
- Stale UUID also appears in the IMC wiki article (`fa99e913-...`) and GuruRMM Enrollment table —
|
||||
refresh on next `/wiki-compile`.
|
||||
|
||||
## Reference Information
|
||||
|
||||
- Prior repair attempts: `clients/instrumental-music-center/docs/2026-04-13-ticket-notes.md` (DISM
|
||||
RestoreHealth E_OUTOFMEMORY, RTM `/Source` CBS_E_SOURCE_MISSING, KB5075999 apply-on-boot rollback).
|
||||
- Three migration paths table: same ticket-notes doc, "Open items / recommendations".
|
||||
- IMC client wiki: `wiki/clients/instrumental-music-center.md`.
|
||||
- Failure signature: `HRESULT_FROM_WIN32(15010) ERROR_EVT_INVALID_EVENT_DATA` at
|
||||
`onecore\admin\wmi\events\config\manproc.cpp:733` -> `CBS_E_INSTALLERS_FAILED` -> rollback.
|
||||
@@ -17,6 +17,8 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure ·
|
||||
|
||||
<!-- Append entries below this line -->
|
||||
|
||||
2026-06-29 | Howard-Home | save/rmm-scratch | [friction] wrote RMM command-id scratch files (.netprobe_id, .stage_id, etc.) to repo root C:/claudetools; .netprobe_id got swept into a sync commit by git add -A and needed git rm. Use the session scratchpad dir for transient IDs, not the repo root. [ctx: ref=feedback_tmp_path_windows]
|
||||
|
||||
2026-06-28 | Howard-Home | rmm/spec-015-safeboot | [friction] safe-mode survival test stranded DESKTOP-MS42HNC: (a) registering only GuruRMMAgent/Watchdog in SafeBootNetwork is insufficient for the agent to connect in Safe Mode (needs network-stack deps e.g. BFE/Dnscache/CryptSvc); (b) Task-Scheduler dead-man does NOT fire in Safe Mode so auto-revert failed -> required manual console recovery [ctx: host=DESKTOP-MS42HNC spec=SPEC-015 fix=use-a-service-not-schtasks-for-revert test-only-on-disposable-VM]
|
||||
|
||||
2026-06-28 | Howard-Home | rmm/powershell-discovery | [friction] broad '*.log' Get-ChildItem on C:WindowsTemp pulled a 157KB Office C2R telemetry log into command output, wasting tokens; scope log searches to the specific filename (mccleanup.log) or a tight -Filter, not *.log
|
||||
|
||||
Reference in New Issue
Block a user