diff --git a/.claude/memory/MEMORY.md b/.claude/memory/MEMORY.md index 8ef339d2..6e1e73ca 100644 --- a/.claude/memory/MEMORY.md +++ b/.claude/memory/MEMORY.md @@ -34,6 +34,7 @@ - [reference_sqlx_migrations_immutable](reference_sqlx_migrations_immutable.md) -- NEVER edit an already-applied sqlx migration file — even a comment. sqlx::migrate! checksums each file at compile time and validates against _sqlx_migrations at startup; a changed checksum crash-loops the server with "migration N was previously applied but has been modified". Code review MUST flag any edit to an applied migration. - [AD2 SSH MTU blackhole](ad2-ssh-mtu-blackhole.md) — AD2 SSH "lockouts"/mid-session read-errors over the Dataforth OpenVPN were a PMTU blackhole (tunnel PMTU ~1424 vs adapter MTU 1500), NOT a ban/account-lockout/flaky tunnel. Fix: pin the OpenVPN adapter MTU to 1400 (done on GURU-5070 via its SYSTEM RMM agent); permanent = `mssfix 1360` on the OpenVPN server. Diagnose over RMM, not SSH. - [DSCA33/45 resolved via Hoffman](project_dsca33_45_resolved_via_hoffman.md) — The "lost" DSCA33/45 spec files are recoverable from the Hoffman API (original certs survived the wipe); do NOT ask John. 56/58 models mined into projects/dataforth-dos/dsca33-45-templates.json; only DSCA33-1948 + DSCA45-1746 (24 units) lack an original. AD2 handoff: DSCA33-45-HOFFMAN-RECOVERY-2026-06-18.md. +- [AD2 comms via sync only](ad2-comms-via-sync-only.md) — The AD2 Dataforth-box Claude session is coord-API-isolated (Gitea only); coord msg/lock/todo never reach it. Coordinate with AD2 ONLY via git /sync (committed docs + ## Note blocks). ## Users - [Howard Enos](user_howard.md) — Mike's brother, technician, full access. Machines: ACG-TECH03L, Howard-Home (authoritative in users.json). diff --git a/.claude/memory/ad2-comms-via-sync-only.md b/.claude/memory/ad2-comms-via-sync-only.md new file mode 100644 index 00000000..23b7e1a1 --- /dev/null +++ b/.claude/memory/ad2-comms-via-sync-only.md @@ -0,0 +1,21 @@ +--- +name: ad2-comms-via-sync-only +description: The AD2 (Dataforth) Claude session is coord-API-isolated — reach it ONLY via git /sync (committed notes/docs), never coord messages/locks +metadata: + type: feedback +--- + +The AD2 Dataforth-box Claude session is **network-isolated from the ACG coord API** (172.16.3.30 — +the Dataforth network can't reach it); it only has Gitea/git access. So coord-API **messages, locks, +and todos NEVER reach AD2**. ALL inter-session coordination with AD2 must go through git **`/sync`**: +committed handoff docs and `## Note for ` blocks in synced session logs, which AD2 reads when +it pulls. A coord lock on an AD2-only file (e.g. `datasheet-exact.js`) is also meaningless — only the +AD2 session edits that box. + +**Why:** burned a round of `coord msg send AD2` + lock that were silent no-ops (Mike: "You can't +coord with AD2 — all comms needs to be via sync"). + +**How to apply:** to hand work to or coordinate with the AD2 session, write it into a committed doc +(e.g. `projects/dataforth-dos/*HANDOFF*.md`) and/or a `## Note for ` block in a session log, +then `/sync`. Do NOT use the coord skill for AD2. (Coord API is still fine for non-isolated ACG +machines.) [[prefer-ssh-over-rmm]] diff --git a/clients/darrell-delphen/session-logs/2026-06/2026-06-18-mike-email-link-sni-block.md b/clients/darrell-delphen/session-logs/2026-06/2026-06-18-mike-email-link-sni-block.md new file mode 100644 index 00000000..9fdcd570 --- /dev/null +++ b/clients/darrell-delphen/session-logs/2026-06/2026-06-18-mike-email-link-sni-block.md @@ -0,0 +1,119 @@ +# 2026-06-18 — Darrell Delphen — Outlook email links failing (ISP SNI block) + +## User +- **User:** Mike Swanson (mike) +- **Machine:** GURU-5070 +- **Role:** admin + +## Session Summary + +Darrell Delphen reported that links in Outlook email would not open on one workstation +(DDDOffice072023), failing with "can't reach this page" / `ERR_CONNECTION_ABORTED` against a +`url.emailprotection.link` URL, while links opened from Gmail worked fine. The failing host is +Intermedia's Email Protection "Safe Link" rewriter — every link in Intermedia-protected mail is +rewritten to `https://url.emailprotection.link/...`, so all Outlook-delivered links routed through it. + +Diagnosis was done entirely over GuruRMM against agent `000ed57d-fd05-4001-871c-244f43155c16`. DNS +resolved correctly and TCP 443 connected, but the TLS handshake died with SChannel `0x80090326` +(SEC_E_ILLEGAL_MESSAGE — "message received was unexpected or badly formatted"). The endpoint's TLS +stack was clean: FIPS off, no SCHANNEL protocol/cipher overrides, no cipher-suite GPO, only Windows +Defender (no third-party AV/proxy/VPN/LSP/WFP callout). The same node `199.193.205.140` handshook +successfully with the real SNI from GURU-5070 on a different network, proving the origin was healthy +and the interference was on the endpoint's path. A blast-radius sweep showed only +`url.emailprotection.link` failed while `google.com`, `microsoft.com`, `outlook.office365.com`, +`cloudflare.com`, `badssl.com`, and even `login.serverdata.net` (same Intermedia infra) all succeeded; +MTU was fine. An SNI-varied handshake to the same IP isolated it conclusively: `example.com` and +`login.serverdata.net` SNIs succeeded 12/12 while `url.emailprotection.link` failed 12/12, across +interleaved source ports — deterministic, SNI-keyed, not flow-hash/LAG. Root cause: a network device +on the path performing SNI-based content inspection that corrupted the handshake for that one hostname. + +The gateway turned out to be an ISP-provided Extreme **EXOS** device the client had no login for. The +fix path was therefore ISP escalation. An escalation packet was drafted, and Cloudflare WARP was +installed on the workstation as an interim workaround — tunneling past the SNI block. With WARP +connected, egress moved to Cloudflare (104.28.152.216) and the real-SNI handshake succeeded (TLS 1.2, +HTTP 200). The ISP then disabled a "NetIQ" web/URL-filtering feature on the gateway, which cleared the +block at the source. After WARP was disconnected the native ISP path was verified working (5/5 +handshakes + HTTP 200, egress back to 167.89.210.225), so WARP was uninstalled and the machine +returned to normal. Work was documented and billed on Syncro ticket #32437 — private technical note, +customer-facing/emailed summary, and 1.0h remote labor ($150.00, invoice #1650728058). + +## Key Decisions + +- Diagnosed exclusively via repeated GuruRMM `SslStream`/`Test-NetConnection` probes rather than + asking the client to run tools — faster and reproducible. +- Used an SNI-varied handshake to the *same fixed IP* as the decisive test. It separated + destination/server problems from path interference and proved the block was keyed on the SNI string. +- Ran a 12x repeatability test per SNI to rule out a faulty LAG/ECMP member (intermittent, 5-tuple + keyed) vs deliberate content matching (deterministic). Result was 0/12 vs 12/12 — deterministic. +- Chose Cloudflare WARP as the interim workaround because the block is SNI-based; any tunnel that + encrypts egress past the EXOS hides the SNI. WARP is the lightest deploy. +- Installed/connected WARP in stages (install, then connect, then verify) so each step could be + confirmed before flipping the tunnel, given the agent bounces on network-stack changes. +- Emailed the customer a plain-language summary (do_not_email:false) and kept the technical detail in + a hidden note. + +## Problems Encountered + +- **Shell state not persisting between Bash calls** — `$TOKEN`/`$RMM` from `rmm-auth.sh` were gone on + the next call (first dispatch produced no output). Fixed by `eval "$(rmm-auth.sh)"` inside every Bash + invocation. +- **PowerShell single-quotes collided with bash single-quoted `SCRIPT='...'`** — embedded + `'C:\Program Files\...'` terminated the bash string (`FilesCloudflareCloudflare: command not found`). + Fixed by defining the script via a quoted heredoc `SCRIPT=$(cat <<'PS' ... PS)`. +- **WARP install/connect bounced the RMM agent** — commands returned `interrupted` ("Agent restarted + during execution") because installing/connecting WARP resets the network stack/WFP. The agent + auto-reconnected (through WARP after connect); verified state with follow-up commands. +- **First WARP uninstall found nothing** — the product registers as **"Cloudflare One Client"**, not + "Cloudflare WARP", so the DisplayName filter missed it. Found the GUID + `{9E49837E-2971-413F-9587-119FA819E572}` and removed via `msiexec /x`. + +## Configuration Changes + +- **Endpoint DDDOffice072023:** Cloudflare WARP (Cloudflare One Client 2026.4.1390.0) installed, + registered, connected, then later disconnected and **fully uninstalled**. Net change to the machine: none. +- **ISP gateway (not us):** ISP disabled the "NetIQ" web/URL-filtering feature on the Extreme EXOS device. +- No changes to the ClaudeTools repo code. + +## Credentials & Secrets + +None discovered, created, or rotated this session. + +## Infrastructure & Servers + +- **Endpoint:** DDDOffice072023 — Windows, GuruRMM agent `000ed57d-fd05-4001-871c-244f43155c16` + (v0.6.66), RMM client "AZ Computer Guru" / site "Discovery test site". LAN gateway 192.168.1.1, + ISP egress 167.89.210.225, WARP egress (while active) 104.28.152.216. +- **ISP gateway:** Extreme Networks **EXOS** device, ISP-provided/managed (client has no login). Was + running a "NetIQ" URL-filtering feature doing SNI inspection. +- **Blocked destination:** `url.emailprotection.link` → CNAME `urlrs.gslb.serverdata.net` → A + 199.193.205.140 (Intermedia Email Protection link-rewriter). GSLB pool also advertises + 199.193.200.65 / 64.78.20.65 / 162.244.196.65, all TCP-dead from any network (not an ISP block). +- **GuruRMM API:** http://172.16.3.30:3001 (JWT via `rmm-auth.sh`). + +## Commands & Outputs + +- Decisive SNI test (same IP, varied SNI), via RMM PowerShell: + - SNI `url.emailprotection.link` → `0x80090326` SEC_E_ILLEGAL_MESSAGE (0/12 ok) + - SNI `example.com` / `login.serverdata.net` → TLS 1.2 AES256 (12/12 ok) +- Off-network control (GURU-5070) to 199.193.205.140 with real SNI → OK TLS 1.2. +- Post-ISP-fix native verify: real-SNI handshake 5/5 OK + `Invoke-WebRequest` HEAD → HTTP 200, + egress 167.89.210.225. +- WARP removal: `msiexec /x {9E49837E-2971-413F-9587-119FA819E572} /qn /norestart` → exit 0; + warp-svc/warp-cli/install-dir/uninstall-entry all gone. + +## Pending / Incomplete Tasks + +- None functional. Block is resolved at the ISP. If the issue recurs, suspect the EXOS "NetIQ" + feature being re-enabled — re-run the SNI-varied handshake test to confirm. +- Optional: if WARP is ever rolled to more machines as a workaround, harden it (force auto-connect, + lock client) and note egress moves to Cloudflare (breaks office-static-IP allowlisting). + +## Reference Information + +- **Syncro ticket:** #32437 (id 112766479) — https://computerguru.syncromsp.com/tickets/112766479 + - Private note id 419714810; public/emailed summary id 419714813 + - Line item id 42925426 (Labor - Remote Business 1.0h @ $150) + - Invoice #1650728058, total $150.00; status Invoiced +- **Customer:** Darrell Delphen (Syncro customer_id 35996725), no prepaid block. +- **SChannel error:** 0x80090326 = SEC_E_ILLEGAL_MESSAGE (handshake message malformed/unexpected) — + signature of in-path TLS/SNI tampering when paired with same-IP success off-network. diff --git a/errorlog.md b/errorlog.md index e903d5db..e59a9047 100644 --- a/errorlog.md +++ b/errorlog.md @@ -21,6 +21,8 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure · 2026-06-18 | Howard-Home | rmm | [friction] agent returns exit -1 'Failed to execute command' on a ~7KB multi-line powershell body sent as one command; split into <2KB section scripts and each ran fine [ctx: host=DESKTOP-TRCIEJA agent=0.6.66] +2026-06-18 | GURU-5070 | coord/ad2-comms | [correction] tried to coordinate with the AD2 session via coord API msg+lock; AD2 is network-isolated (Gitea only, no coord API) so those were no-ops. ALL inter-session comms with AD2 must go via git /sync (committed notes/docs). + 2026-06-18 | GURU-5070 | syncro | comment POST piped straight to jq failed with 'jq: parse error: Invalid numeric literal at line 1 col 10' and left it AMBIGUOUS whether the note posted (GET-verify showed it had NOT); per no-retry rule had to GET first, then re-post. Robust pattern that worked: jq -n payload to a file, POST with --data-binary @file, capture response to a file, then GET-verify by subject. Skill's curl|jq comment pattern should adopt this. [ctx: ticket=32441 skill=syncro pattern=curl-pipe-jq] 2026-06-18 | GURU-5070 | post-bot-alert | Discord POST failed (non-200/unreachable) [ctx: channel=#bot-alerts http=400 resp={"message": "The request body contains invalid JSON.", "code": 50109}]