The DSCA33/DSCA45 main spec files lost in the cryptolocker wipe are recoverable: the original software published correct certs to the Hoffman product API before the wipe and our null-skipping renderer never overwrote them. Mine per-model Final-Test templates (names + specs + verbatim accuracy headers) straight from those originals instead of requesting spec files from Dataforth/John. - dsca33-45-templates.json: 56 models (DSCA33 34/35, DSCA45 22/23); only DSCA33-1948 + DSCA45-1746 (24 units) lack an original. - mine-hoffman-dsca.py: the re-runnable miner. - DSCA33-45-HOFFMAN-RECOVERY handoff for the AD2 session (incl. the gate: validate each render vs its Hoffman original before enabling live rendering). - memories: Hoffman recovery (supersedes the spec-gap "need John" note) and the AD2 SSH MTU-blackhole root cause/fix; errorlog entries (syncro jq, ssh correction). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
41 lines
2.9 KiB
Markdown
41 lines
2.9 KiB
Markdown
---
|
|
name: ad2-ssh-mtu-blackhole
|
|
description: AD2 SSH "lockouts"/mid-session timeouts over the Dataforth OpenVPN were an MTU/PMTU blackhole, not a ban/account-lockout/flaky tunnel; fix = pin the tunnel adapter MTU to 1400
|
|
metadata:
|
|
type: project
|
|
---
|
|
|
|
AD2 (Dataforth, `192.168.0.6`) SSH from the fleet over OpenVPN (client subnet `192.168.6.x`)
|
|
intermittently looked "locked out": sessions **authenticated fine**, then died mid-session with
|
|
`Read error from remote host 192.168.6.2 ... Unknown error [postauth]` and
|
|
`ssh_dispatch_run_fatal: Connection from authenticating user sysadmin ... Connection timed out [preauth]`.
|
|
Small/interactive commands often worked; bulk reads + `scp` stalled.
|
|
|
|
**Root cause (diagnosed 2026-06-18 via RMM — SSH itself was the failing channel, so don't diagnose it over SSH):**
|
|
- NOT account lockout — Windows lockout threshold is 5/30min but **zero 4740 events**; `sysadmin` never locked.
|
|
- NOT an IP ban — **no IPBan/wail2ban/RdpGuard**, **0 inbound firewall block rules**.
|
|
- NOT auth — **every** `Accepted publickey for sysadmin` succeeded.
|
|
- NOT load — AD2 was CPU ~11%, 11.7 GB RAM free.
|
|
- It was a **PMTU blackhole.** OpenVPN tunnel path MTU is **~1424** (DF ping: wire 1424 passes,
|
|
1428 drops). But GURU-5070's OpenVPN adapter (`Local Area Connection`, ifIndex 12, IP
|
|
`192.168.6.2`) was set to **MTU 1500** → TCP negotiated MSS 1460 → full-size bulk/scp segments
|
|
exceeded the tunnel and were **silently dropped (DF set)**, while sub-MTU interactive packets
|
|
passed. That is why it presented as random "lockouts" that got worse with bulk transfer.
|
|
|
|
**Fix applied (2026-06-18):** `Set-NetIPInterface -InterfaceIndex 12 -AddressFamily IPv4 -NlMtuBytes 1400`
|
|
run via **GURU-5070's own RMM agent** (`819df0c8...`, runs as `nt authority\system` = elevated; the
|
|
elevated lever on the local box when you can't self-elevate from the Claude shell). Validated: a
|
|
**1.41 MB single-session SSH transfer to AD2 completed in 9s, no read error** (previously blackholed).
|
|
`~/.ssh/config` `ad2` block annotated + tightened keepalives (`ServerAliveInterval 15`,
|
|
`ServerAliveCountMax 4`, `ConnectTimeout 20`).
|
|
|
|
**Durability / permanent fix:** `Set-NetIPInterface` is registry-persistent, but **OpenVPN Connect may
|
|
reset the adapter MTU to 1500 on reconnect** — re-apply if SSH bulk transfers start stalling again
|
|
(check `Get-NetIPInterface -InterfaceIndex 12`). The real permanent fix is **server-side on the
|
|
Dataforth OpenVPN server: `mssfix 1360` (or `push "tun-mtu 1400"`)** so every fleet client clamps
|
|
automatically — `192.168.6.4` showed the identical symptom, so this is fleet-wide, not 5070-only.
|
|
|
|
Corrects the earlier wrong attribution ("flaky VPN tunnel" / "my rapid scp+ssh bursts triggering a
|
|
ban") — the tunnel is up and stable for small packets; only over-MSS segments were dropped. See
|
|
[[prefer-ssh-over-rmm]] (RMM-as-fallback guidance still holds; the *reason* was MTU, not a flaky VPN).
|