15 KiB
IMC — AIM Station 1 recurrence + wrong-instance correction + AIMSQL orphan confirmed
Date: 2026-05-06
User
- User: Howard Enos (howard)
- Machine: Howard-Home
- Role: tech
Summary
Station 1 at IMC hit the same Telerik.OpenAccess.RT.sql.SQLException: Connection has been closed AIM error today around 12:14 PM MST, ~9 hours after last night's scheduled MSSQL$AIMSQL restart fired cleanly at 02:30. That fast a recurrence forced a fuller enumeration of all three SQL instances on IMC1 — which reversed yesterday's diagnosis. The "leftover" MSSQL$SQLEXPRESS is actually the live production AIM database (SQL Server 2019 Standard Edition installed under the default SQLEXPRESS instance name and never renamed). MSSQL$AIMSQL is the actual orphan, hosting only 2023-era conversion-test DBs with zero active client connections. Today's restart had no effect because it was on the wrong instance.
Documented the correction in yesterday's session log (correction block at top + reversal in the Note for Mike), updated PROJECT_STATE.md, unregistered the now-pointless scheduled task, and saved a feedback memory so this trap doesn't bite us again. No production service touches.
The user-facing Telerik error is still likely to recur — nothing today actually prevents it. Next reversible lever is capping max server memory on each instance to stop the buffer-pool trim cycle that's reaping idle pool slots; awaiting Howard's go-ahead.
What was done
1. AIM Station 1 recurrence — initial diagnostic
Read-only log pull on IMC1 via GuruRMM (local SSH from Howard-Home blocked by the documented 192.168.0.0/24 collision with home Wi-Fi). Output preserved at clients/instrumental-music-center/scripts/out/imc1_stdout.txt.
| Signal | Yesterday (2026-05-05) | Today (10h after 02:30 restart) |
|---|---|---|
Scheduled AIMSQL_Restart_20260506_0230 |
(created, pending) | LastRunTime 02:30:30, LastTaskResult 0 — fired clean |
| AIMSQL PID / start time | 34536 / 2026-04-25 22:01:37 | 12772 / 2026-05-06 02:30:02 (post-restart) |
MSSQL$MICROSOFT##WID Event 17890 paging events |
8 in 4h | 65 in 10h (~3x rate, all on WID, none on AIMSQL/SQLEXPRESS) |
| AIMSQL Total Server Memory | 587 MB (Target 7,224 MB) | 371 MB (Target 5,778 MB) — actually lower, pool actively trimmed |
| AIMSQL Page Life Expectancy | 842,990s (~9.7 d) | 16,811s (~4.7 hr) — collapsed |
AIMSQL page_fault_count |
5,689,041 over 11 days (~516k/day) | 605,504 over 10h 3m (~1.45M/day, ~3x baseline) |
| Active RDP user count | 4 | 6 (added station3 and one more) |
| Free physical RAM | n/a | 6.99 GB / 32 GB (~21%) |
The restart itself fired cleanly. AIMSQL ERRORLOG has been silent except for the 02:30 startup chatter. Yet the recurrence happened on Station 1. Something didn't add up — which led to the next step.
2. SQLEXPRESS enumeration — the bombshell
Re-ran the same read-only enumeration pattern targeting MSSQL$SQLEXPRESS (yesterday's "leftover instance" question for Mike). Output: imc1_sqlexpress_enum.txt.
SERVERPROPERTY('Edition') returned Standard Edition (64-bit). It's not Express — somebody installed Standard with the default SQLEXPRESS instance name and never renamed it. The instance NAME is misleading.
| Fact | Value |
|---|---|
| Instance | IMC1\SQLEXPRESS, TCP 61151 |
| Edition | SQL Server 2019 Standard Edition (64-bit) |
| Version | 15.0.2165.1 (KB5084817 — same March 2026 GDR as AIMSQL) |
| Service account | IMC\AIM (domain account) |
| Process | PID 20756, started 2026-04-25 21:47:53, working set 6.86 GB (no max server memory cap) |
| Production DB | IMCAIM (created 2023-08-21) — the live AIM database |
| Other DBs | AIM (2021-03-18), IMC (2023-08-21), IMCAIM_Training (2024-03-01) — all backed up daily but no live sessions |
| ERRORLOG | E:\SQL\MSSQL14.SQLEXPRESS\MSSQL\Log\ERRORLOG |
| Backup chain | Cloudberry → C:\ProgramData\Online Backup\MSSQL\IMC1_SQLEXPRESS\ + local E:\SQL\MSSQL14.SQLEXPRESS\MSSQL\Backup\ (daily, both succeed) |
Active connections to SQLEXPRESS at time of check:
| Workstation | IP | DB |
|---|---|---|
| IMC-MINI | 192.168.0.72 | IMCAIM |
| IMC-SVCSTR | 192.168.0.55 | IMCAIM |
| IMC-LESSONS | 192.168.0.62 | IMCAIM |
| IMC-STATION2 | 192.168.0.66 | IMCAIM |
| IMC-L1-STATION9 | 192.168.0.41 | IMCAIM |
| DESKTOP-44L80C0 | 192.168.0.46 | IMCAIM |
| DESKTOP-MR3ALTK | 192.168.0.59 | IMCAIM |
| REPAIRADMIN | 192.168.0.48 | IMCAIM |
| C2B | 192.168.0.4 | IMCAIM |
| (server-local AIM Webservice / Runtime) | IMC1 | IMCAIM x22 sessions |
All sessions login as AIMUser1 via .Net SqlClient Data Provider (the AIM front-end). Every register, repair workstation, lessons workstation, and the C2B credit-card module talks to this instance. Nothing in the active list matches IMC-STATION1 exactly — Station 1 is likely either one of the DESKTOP-* machines (not yet renamed to the IMC- naming convention) or it was disconnected after the error and hadn't reconnected at the time of the check. Open question for Leslie.
3. AIMSQL counter-enumeration — orphan confirmed
Same enumeration targeted at MSSQL$AIMSQL to verify nothing real depends on it. Output: imc1_aimsql_enum.txt.
| Fact | Value |
|---|---|
| Instance | IMC1\AIMSQL, TCP 63116 (dynamic) |
| Edition | SQL Server 2019 Express Edition GDR 15.0.2165.1 |
| Service account | IMC\IMC1$ (machine account) |
| Process | PID 12772 (post-restart), working set 172 MB, very lightly loaded |
| Established TCP connections | 0 (only Listen state on IPv4 + IPv6) |
| Active user sessions | 0 real — only NT SERVICE\SQLTELEMETRY$AIMSQL heartbeat + our own NT AUTHORITY\SYSTEM query |
| Databases | AIM (2023-06-09), TestConv61223 (2023-06-12), IMC (2023-07-03) — all 2023-era conversion test artifacts |
| ERRORLOG | Silent except 02:30:02 startup chatter from today's restart |
| Backups | None — no .bak files in any AIMSQL Backup directory |
max server memory |
uncapped (default 2,147,483,647 MB), but Express enforces ~1.4 GB buffer-pool ceiling regardless |
Verdict: confirmed orphan. Zero live clients, zero session activity, no active backup chain landing on it, only legacy DBs from a 2023 conversion that didn't go to production (the live IMCAIM was created 2023-08-21 on SQLEXPRESS and the AIMSQL AIM/IMC DBs from June/July 2023 appear to be the precursors).
Caveat for any future shutdown: the user .mdf files weren't surfaced by the filesystem walk under MSSQL15.AIMSQL\MSSQL\DATA or S:\*AIMSQL*. Locate and back up AIM.mdf, IMC.mdf, TestConv61223.mdf (and their .ldf siblings) before any uninstall. Mike to decide whether the 2023-era data is worth keeping.
4. Scheduled task removal (only authorized change today)
Unregistered AIMSQL_Restart_20260506_0230 on IMC1. Pre-removal LastRunTime 05/06 02:30:30, LastTaskResult 0. Confirmed gone. Audit-trail artifacts left intentionally on disk:
C:\Windows\Temp\aimsql-restart.ps1(984 bytes, modified 2026-05-05 18:53:27)C:\Windows\Temp\aimsql-restart.log(1,150 bytes, modified 2026-05-06 02:30:19)
GuruRMM command_id for audit: 1889a150-b775-4fb2-9f4b-cd794d4e7d9f, exit 0.
5. Documentation amendments
session-logs/2026-05-05-howard-aim-connection-broken-investigation.md— added a## Correction (2026-05-06)block immediately after Summary with the full reversal narrative. Inside the existing## Note for Mike, added a header callout pointing to the correction block, then struck through item #1 (the "shut down SQLEXPRESS" recommendation) and added inline reversal text. Historical content preserved verbatim — additions only, no rewrites.PROJECT_STATE.md— bumpedLast updatedto 2026-05-06; rewrote the Current State paragraph to nameIMC1\SQLEXPRESSas production with the misleading-name caveat; expanded the Infrastructure table from a single SQL row to three rows (production, orphan, system) with role labels; added a Known Issue entry for the AIM connection-broken recurrence pattern + the misleading-name trap; added today'sDIAGNOSEDrow and aSUPERSEDEDflag on yesterday's row in Recent Changes.
6. Feedback memory saved
.claude/memory/feedback_sql_instance_role_by_connection.md and indexed in MEMORY.md. Rule: verify SQL instance roles by sys.dm_exec_sessions + Get-NetTCPConnection -OwningProcess, not by instance name. The IMC1 near-miss is recorded as the originating incident.
Why the error keeps happening
Same failure mode both days, on the right instance now. SQLEXPRESS sustains memory pressure on a 32 GB box that's also a DC + RDS host with 6 RDP user sessions + AIMsi Webservice + AIMsi Runtime + 3 SQL instances + QuickBooks Enterprise installed locally. Windows trims SQL working sets when global pressure crosses a threshold (visible as the 17890 events on WID — the canary). The trim cycle reaps idle TCP pool slots from SQLEXPRESS too. Telerik OpenAccess discovers the dead handle on the next reuse, throws connection broken and recovery is not possible, and the user sees the raw stack trace because Telerik has no transient-fault retry.
Why yesterday's restart helped briefly — restarting AIMSQL momentarily freed ~600 MB, which may have eased global pressure for a few hours. Pure side effect; the actual production instance was untouched. The 17890 rate doubling overnight (8/4h → 65/10h) shows pressure rebuilt fast.
Plan
Tonight / next session (Howard's call)
- Cap
max server memoryon each instance — reversible 1-second config change, no service touch:- SQLEXPRESS: 12,288 MB (12 GB) — leaves headroom on a 32 GB box for OS + DC + 6 RDP users + AIMsi services + QuickBooks
- WID: 512 MB
- AIMSQL: 256 MB (or just leave it; we want it stopped eventually anyway) The cap stops the trim cycle even when global pressure rises, because SQL voluntarily stays under the ceiling instead of getting forcibly trimmed.
Confirm Station 1's hostname / IP with Leslie. None of the active SQLEXPRESS sessions matchedResolved 2026-05-06 by Howard: Station 1 =IMC-STATION1exactly. Likely eitherDESKTOP-44L80C0/DESKTOP-MR3ALTK(un-renamed boxes) or a station that was disconnected at check time.IMC-STATION1, IP 192.168.0.50. It wasn't in the active connection list at enumeration time — consistent with it being disconnected after the user-facing error and not yet reconnected at our snapshot. For next recurrence, target diagnostics byclient_net_address = '192.168.0.50'insys.dm_exec_sessionsand by remote IP in SQLEXPRESS connection events.- Locate AIMSQL
.mdffiles before any consolidation talk. They weren't where I expected. Worth a 5-minute filesystem search. - Schedule SQLEXPRESS-targeted restart cadence only if the memory cap doesn't hold — but it's after-hours-only because every register would briefly disconnect.
Long term (Mike conversation)
- Stop + uninstall
MSSQL$AIMSQLonce the 2023-era DBs are backed up and confirmed safe to retire. - Decide WID instance: WSUS isn't actively serving clients per yesterday's check. Probably also stoppable.
- Server 2019 migration / dedicated DB host — same conversation as before, pushed by today's evidence.
Note for Mike
READ THIS BEFORE ACTING ON YESTERDAY'S NOTE FOR MIKE. Yesterday's note had a critical wrong-instance error in item #1 — see the correction block at the top of
2026-05-05-howard-aim-connection-broken-investigation.mdfor the strikethrough and reversal.
Bottom line: Production AIM lives on IMC1\SQLEXPRESS, not IMC1\AIMSQL. Yesterday's labeling was reversed because the SQLEXPRESS instance is actually SQL Server 2019 Standard Edition installed under the default Express instance name. The "leftover" we proposed shutting down to free headroom is the live POS database. Stopping it would have killed every register, repair workstation, lessons workstation, and the C2B credit module instantly.
What's actually true now:
IMC1\SQLEXPRESS(TCP 61151) — PRODUCTION. Standard Edition. DBIMCAIM. Service accountIMC\AIM. ~9 store workstations connected. Do not stop, do not uninstall, do not let anyone misled by the name shut it down.IMC1\AIMSQL(TCP 63116) — the actual orphan. True SQL 2019 Express. Zero clients, only 2023-era conversion-test DBs. This is the consolidation candidate. Today's scheduled restart task targeting AIMSQL was unregistered (it had no effect on the user-facing problem).IMC1\MICROSOFT##WID— WSUS / AD RMS. Pages out under host pressure (canary for the AIM error). Possibly also stoppable if WSUS isn't serving clients.
Decisions on your plate:
- Approve
max server memorycaps as the next reversible fix (SQLEXPRESS 12 GB, WID 512 MB, AIMSQL 256 MB). Howard hasn't applied them yet — awaiting your green light. - Approve AIMSQL consolidation once the 2023-era
AIM/IMC/TestConv61223DBs are backed up and confirmed safe to retire. Howard to locate the.mdffiles first (they weren't where I expected). - Approve WID consolidation if WSUS/AD RMS isn't really being used at IMC.
- Server 2016 EOL is approaching (extended support ends 2027-01-12). Today's incident is more evidence to push the migration timeline. Worth scoping at next ACG strategy call.
- Server 2019 migration / dedicated DB host conversation — unchanged from yesterday, more urgent now.
The SvcRestartTask that ran 11:00 today (yesterday's mystery) is a daily 11:00 task that returns 0 — it's not the AIM trigger, putting that thread to bed.
Until we apply the memory caps, the AIM error is likely to keep recurring at roughly the same cadence we've seen. Howard is on the lookout for the next occurrence and can do another targeted log pull if it's helpful for your decision.
References
- Today's GuruRMM commands:
- SQLEXPRESS enumeration: command run via agent
fa99e913-1027-4e33-a928-7695e31068e7 - AIMSQL enumeration + scheduled task removal:
1889a150-b775-4fb2-9f4b-cd794d4e7d9f
- SQLEXPRESS enumeration: command run via agent
- Raw outputs:
clients/instrumental-music-center/scripts/out/imc1_stdout.txt(initial diag)clients/instrumental-music-center/scripts/out/imc1_sqlexpress_enum.txt(SQLEXPRESS enum)clients/instrumental-music-center/scripts/out/imc1_aimsql_enum.txt(AIMSQL counter-enum)
- Yesterday's session log + correction block:
clients/instrumental-music-center/session-logs/2026-05-05-howard-aim-connection-broken-investigation.md - Updated project state:
clients/instrumental-music-center/PROJECT_STATE.md - Feedback memory:
.claude/memory/feedback_sql_instance_role_by_connection.md - IMC1 vault (existing):
clients/imc/imc1.sops.yaml - GuruRMM agent ID for IMC1:
fa99e913-1027-4e33-a928-7695e31068e7