Files
claudetools/clients/instrumental-music-center/session-logs/2026-05-06-howard-imc1-aim-instance-correction.md
Howard Enos 4da4e5bac5 sync: auto-sync from HOWARD-HOME at 2026-05-06 13:50:24
Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-05-06 13:50:24
2026-05-06 13:50:25 -07:00

15 KiB

IMC — AIM Station 1 recurrence + wrong-instance correction + AIMSQL orphan confirmed

Date: 2026-05-06

User

  • User: Howard Enos (howard)
  • Machine: Howard-Home
  • Role: tech

Summary

Station 1 at IMC hit the same Telerik.OpenAccess.RT.sql.SQLException: Connection has been closed AIM error today around 12:14 PM MST, ~9 hours after last night's scheduled MSSQL$AIMSQL restart fired cleanly at 02:30. That fast a recurrence forced a fuller enumeration of all three SQL instances on IMC1 — which reversed yesterday's diagnosis. The "leftover" MSSQL$SQLEXPRESS is actually the live production AIM database (SQL Server 2019 Standard Edition installed under the default SQLEXPRESS instance name and never renamed). MSSQL$AIMSQL is the actual orphan, hosting only 2023-era conversion-test DBs with zero active client connections. Today's restart had no effect because it was on the wrong instance.

Documented the correction in yesterday's session log (correction block at top + reversal in the Note for Mike), updated PROJECT_STATE.md, unregistered the now-pointless scheduled task, and saved a feedback memory so this trap doesn't bite us again. No production service touches.

The user-facing Telerik error is still likely to recur — nothing today actually prevents it. Next reversible lever is capping max server memory on each instance to stop the buffer-pool trim cycle that's reaping idle pool slots; awaiting Howard's go-ahead.

What was done

1. AIM Station 1 recurrence — initial diagnostic

Read-only log pull on IMC1 via GuruRMM (local SSH from Howard-Home blocked by the documented 192.168.0.0/24 collision with home Wi-Fi). Output preserved at clients/instrumental-music-center/scripts/out/imc1_stdout.txt.

Signal Yesterday (2026-05-05) Today (10h after 02:30 restart)
Scheduled AIMSQL_Restart_20260506_0230 (created, pending) LastRunTime 02:30:30, LastTaskResult 0 — fired clean
AIMSQL PID / start time 34536 / 2026-04-25 22:01:37 12772 / 2026-05-06 02:30:02 (post-restart)
MSSQL$MICROSOFT##WID Event 17890 paging events 8 in 4h 65 in 10h (~3x rate, all on WID, none on AIMSQL/SQLEXPRESS)
AIMSQL Total Server Memory 587 MB (Target 7,224 MB) 371 MB (Target 5,778 MB) — actually lower, pool actively trimmed
AIMSQL Page Life Expectancy 842,990s (~9.7 d) 16,811s (~4.7 hr) — collapsed
AIMSQL page_fault_count 5,689,041 over 11 days (~516k/day) 605,504 over 10h 3m (~1.45M/day, ~3x baseline)
Active RDP user count 4 6 (added station3 and one more)
Free physical RAM n/a 6.99 GB / 32 GB (~21%)

The restart itself fired cleanly. AIMSQL ERRORLOG has been silent except for the 02:30 startup chatter. Yet the recurrence happened on Station 1. Something didn't add up — which led to the next step.

2. SQLEXPRESS enumeration — the bombshell

Re-ran the same read-only enumeration pattern targeting MSSQL$SQLEXPRESS (yesterday's "leftover instance" question for Mike). Output: imc1_sqlexpress_enum.txt.

SERVERPROPERTY('Edition') returned Standard Edition (64-bit). It's not Express — somebody installed Standard with the default SQLEXPRESS instance name and never renamed it. The instance NAME is misleading.

Fact Value
Instance IMC1\SQLEXPRESS, TCP 61151
Edition SQL Server 2019 Standard Edition (64-bit)
Version 15.0.2165.1 (KB5084817 — same March 2026 GDR as AIMSQL)
Service account IMC\AIM (domain account)
Process PID 20756, started 2026-04-25 21:47:53, working set 6.86 GB (no max server memory cap)
Production DB IMCAIM (created 2023-08-21) — the live AIM database
Other DBs AIM (2021-03-18), IMC (2023-08-21), IMCAIM_Training (2024-03-01) — all backed up daily but no live sessions
ERRORLOG E:\SQL\MSSQL14.SQLEXPRESS\MSSQL\Log\ERRORLOG
Backup chain Cloudberry → C:\ProgramData\Online Backup\MSSQL\IMC1_SQLEXPRESS\ + local E:\SQL\MSSQL14.SQLEXPRESS\MSSQL\Backup\ (daily, both succeed)

Active connections to SQLEXPRESS at time of check:

Workstation IP DB
IMC-MINI 192.168.0.72 IMCAIM
IMC-SVCSTR 192.168.0.55 IMCAIM
IMC-LESSONS 192.168.0.62 IMCAIM
IMC-STATION2 192.168.0.66 IMCAIM
IMC-L1-STATION9 192.168.0.41 IMCAIM
DESKTOP-44L80C0 192.168.0.46 IMCAIM
DESKTOP-MR3ALTK 192.168.0.59 IMCAIM
REPAIRADMIN 192.168.0.48 IMCAIM
C2B 192.168.0.4 IMCAIM
(server-local AIM Webservice / Runtime) IMC1 IMCAIM x22 sessions

All sessions login as AIMUser1 via .Net SqlClient Data Provider (the AIM front-end). Every register, repair workstation, lessons workstation, and the C2B credit-card module talks to this instance. Nothing in the active list matches IMC-STATION1 exactly — Station 1 is likely either one of the DESKTOP-* machines (not yet renamed to the IMC- naming convention) or it was disconnected after the error and hadn't reconnected at the time of the check. Open question for Leslie.

3. AIMSQL counter-enumeration — orphan confirmed

Same enumeration targeted at MSSQL$AIMSQL to verify nothing real depends on it. Output: imc1_aimsql_enum.txt.

Fact Value
Instance IMC1\AIMSQL, TCP 63116 (dynamic)
Edition SQL Server 2019 Express Edition GDR 15.0.2165.1
Service account IMC\IMC1$ (machine account)
Process PID 12772 (post-restart), working set 172 MB, very lightly loaded
Established TCP connections 0 (only Listen state on IPv4 + IPv6)
Active user sessions 0 real — only NT SERVICE\SQLTELEMETRY$AIMSQL heartbeat + our own NT AUTHORITY\SYSTEM query
Databases AIM (2023-06-09), TestConv61223 (2023-06-12), IMC (2023-07-03) — all 2023-era conversion test artifacts
ERRORLOG Silent except 02:30:02 startup chatter from today's restart
Backups None — no .bak files in any AIMSQL Backup directory
max server memory uncapped (default 2,147,483,647 MB), but Express enforces ~1.4 GB buffer-pool ceiling regardless

Verdict: confirmed orphan. Zero live clients, zero session activity, no active backup chain landing on it, only legacy DBs from a 2023 conversion that didn't go to production (the live IMCAIM was created 2023-08-21 on SQLEXPRESS and the AIMSQL AIM/IMC DBs from June/July 2023 appear to be the precursors).

Caveat for any future shutdown: the user .mdf files weren't surfaced by the filesystem walk under MSSQL15.AIMSQL\MSSQL\DATA or S:\*AIMSQL*. Locate and back up AIM.mdf, IMC.mdf, TestConv61223.mdf (and their .ldf siblings) before any uninstall. Mike to decide whether the 2023-era data is worth keeping.

4. Scheduled task removal (only authorized change today)

Unregistered AIMSQL_Restart_20260506_0230 on IMC1. Pre-removal LastRunTime 05/06 02:30:30, LastTaskResult 0. Confirmed gone. Audit-trail artifacts left intentionally on disk:

  • C:\Windows\Temp\aimsql-restart.ps1 (984 bytes, modified 2026-05-05 18:53:27)
  • C:\Windows\Temp\aimsql-restart.log (1,150 bytes, modified 2026-05-06 02:30:19)

GuruRMM command_id for audit: 1889a150-b775-4fb2-9f4b-cd794d4e7d9f, exit 0.

5. Documentation amendments

  • session-logs/2026-05-05-howard-aim-connection-broken-investigation.md — added a ## Correction (2026-05-06) block immediately after Summary with the full reversal narrative. Inside the existing ## Note for Mike, added a header callout pointing to the correction block, then struck through item #1 (the "shut down SQLEXPRESS" recommendation) and added inline reversal text. Historical content preserved verbatim — additions only, no rewrites.
  • PROJECT_STATE.md — bumped Last updated to 2026-05-06; rewrote the Current State paragraph to name IMC1\SQLEXPRESS as production with the misleading-name caveat; expanded the Infrastructure table from a single SQL row to three rows (production, orphan, system) with role labels; added a Known Issue entry for the AIM connection-broken recurrence pattern + the misleading-name trap; added today's DIAGNOSED row and a SUPERSEDED flag on yesterday's row in Recent Changes.

6. Feedback memory saved

.claude/memory/feedback_sql_instance_role_by_connection.md and indexed in MEMORY.md. Rule: verify SQL instance roles by sys.dm_exec_sessions + Get-NetTCPConnection -OwningProcess, not by instance name. The IMC1 near-miss is recorded as the originating incident.

Why the error keeps happening

Same failure mode both days, on the right instance now. SQLEXPRESS sustains memory pressure on a 32 GB box that's also a DC + RDS host with 6 RDP user sessions + AIMsi Webservice + AIMsi Runtime + 3 SQL instances + QuickBooks Enterprise installed locally. Windows trims SQL working sets when global pressure crosses a threshold (visible as the 17890 events on WID — the canary). The trim cycle reaps idle TCP pool slots from SQLEXPRESS too. Telerik OpenAccess discovers the dead handle on the next reuse, throws connection broken and recovery is not possible, and the user sees the raw stack trace because Telerik has no transient-fault retry.

Why yesterday's restart helped briefly — restarting AIMSQL momentarily freed ~600 MB, which may have eased global pressure for a few hours. Pure side effect; the actual production instance was untouched. The 17890 rate doubling overnight (8/4h → 65/10h) shows pressure rebuilt fast.

Plan

Tonight / next session (Howard's call)

  1. Cap max server memory on each instance — reversible 1-second config change, no service touch:
    • SQLEXPRESS: 12,288 MB (12 GB) — leaves headroom on a 32 GB box for OS + DC + 6 RDP users + AIMsi services + QuickBooks
    • WID: 512 MB
    • AIMSQL: 256 MB (or just leave it; we want it stopped eventually anyway) The cap stops the trim cycle even when global pressure rises, because SQL voluntarily stays under the ceiling instead of getting forcibly trimmed.
  2. Confirm Station 1's hostname / IP with Leslie. None of the active SQLEXPRESS sessions matched IMC-STATION1 exactly. Likely either DESKTOP-44L80C0 / DESKTOP-MR3ALTK (un-renamed boxes) or a station that was disconnected at check time. Resolved 2026-05-06 by Howard: Station 1 = IMC-STATION1, IP 192.168.0.50. It wasn't in the active connection list at enumeration time — consistent with it being disconnected after the user-facing error and not yet reconnected at our snapshot. For next recurrence, target diagnostics by client_net_address = '192.168.0.50' in sys.dm_exec_sessions and by remote IP in SQLEXPRESS connection events.
  3. Locate AIMSQL .mdf files before any consolidation talk. They weren't where I expected. Worth a 5-minute filesystem search.
  4. Schedule SQLEXPRESS-targeted restart cadence only if the memory cap doesn't hold — but it's after-hours-only because every register would briefly disconnect.

Long term (Mike conversation)

  • Stop + uninstall MSSQL$AIMSQL once the 2023-era DBs are backed up and confirmed safe to retire.
  • Decide WID instance: WSUS isn't actively serving clients per yesterday's check. Probably also stoppable.
  • Server 2019 migration / dedicated DB host — same conversation as before, pushed by today's evidence.

Note for Mike

READ THIS BEFORE ACTING ON YESTERDAY'S NOTE FOR MIKE. Yesterday's note had a critical wrong-instance error in item #1 — see the correction block at the top of 2026-05-05-howard-aim-connection-broken-investigation.md for the strikethrough and reversal.

Bottom line: Production AIM lives on IMC1\SQLEXPRESS, not IMC1\AIMSQL. Yesterday's labeling was reversed because the SQLEXPRESS instance is actually SQL Server 2019 Standard Edition installed under the default Express instance name. The "leftover" we proposed shutting down to free headroom is the live POS database. Stopping it would have killed every register, repair workstation, lessons workstation, and the C2B credit module instantly.

What's actually true now:

  • IMC1\SQLEXPRESS (TCP 61151) — PRODUCTION. Standard Edition. DB IMCAIM. Service account IMC\AIM. ~9 store workstations connected. Do not stop, do not uninstall, do not let anyone misled by the name shut it down.
  • IMC1\AIMSQL (TCP 63116) — the actual orphan. True SQL 2019 Express. Zero clients, only 2023-era conversion-test DBs. This is the consolidation candidate. Today's scheduled restart task targeting AIMSQL was unregistered (it had no effect on the user-facing problem).
  • IMC1\MICROSOFT##WID — WSUS / AD RMS. Pages out under host pressure (canary for the AIM error). Possibly also stoppable if WSUS isn't serving clients.

Decisions on your plate:

  1. Approve max server memory caps as the next reversible fix (SQLEXPRESS 12 GB, WID 512 MB, AIMSQL 256 MB). Howard hasn't applied them yet — awaiting your green light.
  2. Approve AIMSQL consolidation once the 2023-era AIM / IMC / TestConv61223 DBs are backed up and confirmed safe to retire. Howard to locate the .mdf files first (they weren't where I expected).
  3. Approve WID consolidation if WSUS/AD RMS isn't really being used at IMC.
  4. Server 2016 EOL is approaching (extended support ends 2027-01-12). Today's incident is more evidence to push the migration timeline. Worth scoping at next ACG strategy call.
  5. Server 2019 migration / dedicated DB host conversation — unchanged from yesterday, more urgent now.

The SvcRestartTask that ran 11:00 today (yesterday's mystery) is a daily 11:00 task that returns 0 — it's not the AIM trigger, putting that thread to bed.

Until we apply the memory caps, the AIM error is likely to keep recurring at roughly the same cadence we've seen. Howard is on the lookout for the next occurrence and can do another targeted log pull if it's helpful for your decision.

References

  • Today's GuruRMM commands:
    • SQLEXPRESS enumeration: command run via agent fa99e913-1027-4e33-a928-7695e31068e7
    • AIMSQL enumeration + scheduled task removal: 1889a150-b775-4fb2-9f4b-cd794d4e7d9f
  • Raw outputs:
    • clients/instrumental-music-center/scripts/out/imc1_stdout.txt (initial diag)
    • clients/instrumental-music-center/scripts/out/imc1_sqlexpress_enum.txt (SQLEXPRESS enum)
    • clients/instrumental-music-center/scripts/out/imc1_aimsql_enum.txt (AIMSQL counter-enum)
  • Yesterday's session log + correction block: clients/instrumental-music-center/session-logs/2026-05-05-howard-aim-connection-broken-investigation.md
  • Updated project state: clients/instrumental-music-center/PROJECT_STATE.md
  • Feedback memory: .claude/memory/feedback_sql_instance_role_by_connection.md
  • IMC1 vault (existing): clients/imc/imc1.sops.yaml
  • GuruRMM agent ID for IMC1: fa99e913-1027-4e33-a928-7695e31068e7