sync: auto-sync from GURU-5070 at 2026-06-18 12:49:38

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-18 12:49:38
This commit is contained in:
2026-06-18 12:49:57 -07:00
parent 9d2d233f1e
commit 6f4cadb16f
4 changed files with 143 additions and 0 deletions

View File

@@ -34,6 +34,7 @@
- [reference_sqlx_migrations_immutable](reference_sqlx_migrations_immutable.md) -- NEVER edit an already-applied sqlx migration file — even a comment. sqlx::migrate! checksums each file at compile time and validates against _sqlx_migrations at startup; a changed checksum crash-loops the server with "migration N was previously applied but has been modified". Code review MUST flag any edit to an applied migration. - [reference_sqlx_migrations_immutable](reference_sqlx_migrations_immutable.md) -- NEVER edit an already-applied sqlx migration file — even a comment. sqlx::migrate! checksums each file at compile time and validates against _sqlx_migrations at startup; a changed checksum crash-loops the server with "migration N was previously applied but has been modified". Code review MUST flag any edit to an applied migration.
- [AD2 SSH MTU blackhole](ad2-ssh-mtu-blackhole.md) — AD2 SSH "lockouts"/mid-session read-errors over the Dataforth OpenVPN were a PMTU blackhole (tunnel PMTU ~1424 vs adapter MTU 1500), NOT a ban/account-lockout/flaky tunnel. Fix: pin the OpenVPN adapter MTU to 1400 (done on GURU-5070 via its SYSTEM RMM agent); permanent = `mssfix 1360` on the OpenVPN server. Diagnose over RMM, not SSH. - [AD2 SSH MTU blackhole](ad2-ssh-mtu-blackhole.md) — AD2 SSH "lockouts"/mid-session read-errors over the Dataforth OpenVPN were a PMTU blackhole (tunnel PMTU ~1424 vs adapter MTU 1500), NOT a ban/account-lockout/flaky tunnel. Fix: pin the OpenVPN adapter MTU to 1400 (done on GURU-5070 via its SYSTEM RMM agent); permanent = `mssfix 1360` on the OpenVPN server. Diagnose over RMM, not SSH.
- [DSCA33/45 resolved via Hoffman](project_dsca33_45_resolved_via_hoffman.md) — The "lost" DSCA33/45 spec files are recoverable from the Hoffman API (original certs survived the wipe); do NOT ask John. 56/58 models mined into projects/dataforth-dos/dsca33-45-templates.json; only DSCA33-1948 + DSCA45-1746 (24 units) lack an original. AD2 handoff: DSCA33-45-HOFFMAN-RECOVERY-2026-06-18.md. - [DSCA33/45 resolved via Hoffman](project_dsca33_45_resolved_via_hoffman.md) — The "lost" DSCA33/45 spec files are recoverable from the Hoffman API (original certs survived the wipe); do NOT ask John. 56/58 models mined into projects/dataforth-dos/dsca33-45-templates.json; only DSCA33-1948 + DSCA45-1746 (24 units) lack an original. AD2 handoff: DSCA33-45-HOFFMAN-RECOVERY-2026-06-18.md.
- [AD2 comms via sync only](ad2-comms-via-sync-only.md) — The AD2 Dataforth-box Claude session is coord-API-isolated (Gitea only); coord msg/lock/todo never reach it. Coordinate with AD2 ONLY via git /sync (committed docs + ## Note blocks).
## Users ## Users
- [Howard Enos](user_howard.md) — Mike's brother, technician, full access. Machines: ACG-TECH03L, Howard-Home (authoritative in users.json). - [Howard Enos](user_howard.md) — Mike's brother, technician, full access. Machines: ACG-TECH03L, Howard-Home (authoritative in users.json).

View File

@@ -0,0 +1,21 @@
---
name: ad2-comms-via-sync-only
description: The AD2 (Dataforth) Claude session is coord-API-isolated — reach it ONLY via git /sync (committed notes/docs), never coord messages/locks
metadata:
type: feedback
---
The AD2 Dataforth-box Claude session is **network-isolated from the ACG coord API** (172.16.3.30 —
the Dataforth network can't reach it); it only has Gitea/git access. So coord-API **messages, locks,
and todos NEVER reach AD2**. ALL inter-session coordination with AD2 must go through git **`/sync`**:
committed handoff docs and `## Note for <user>` blocks in synced session logs, which AD2 reads when
it pulls. A coord lock on an AD2-only file (e.g. `datasheet-exact.js`) is also meaningless — only the
AD2 session edits that box.
**Why:** burned a round of `coord msg send AD2` + lock that were silent no-ops (Mike: "You can't
coord with AD2 — all comms needs to be via sync").
**How to apply:** to hand work to or coordinate with the AD2 session, write it into a committed doc
(e.g. `projects/dataforth-dos/*HANDOFF*.md`) and/or a `## Note for <user>` block in a session log,
then `/sync`. Do NOT use the coord skill for AD2. (Coord API is still fine for non-isolated ACG
machines.) [[prefer-ssh-over-rmm]]

View File

@@ -0,0 +1,119 @@
# 2026-06-18 — Darrell Delphen — Outlook email links failing (ISP SNI block)
## User
- **User:** Mike Swanson (mike)
- **Machine:** GURU-5070
- **Role:** admin
## Session Summary
Darrell Delphen reported that links in Outlook email would not open on one workstation
(DDDOffice072023), failing with "can't reach this page" / `ERR_CONNECTION_ABORTED` against a
`url.emailprotection.link` URL, while links opened from Gmail worked fine. The failing host is
Intermedia's Email Protection "Safe Link" rewriter — every link in Intermedia-protected mail is
rewritten to `https://url.emailprotection.link/...`, so all Outlook-delivered links routed through it.
Diagnosis was done entirely over GuruRMM against agent `000ed57d-fd05-4001-871c-244f43155c16`. DNS
resolved correctly and TCP 443 connected, but the TLS handshake died with SChannel `0x80090326`
(SEC_E_ILLEGAL_MESSAGE — "message received was unexpected or badly formatted"). The endpoint's TLS
stack was clean: FIPS off, no SCHANNEL protocol/cipher overrides, no cipher-suite GPO, only Windows
Defender (no third-party AV/proxy/VPN/LSP/WFP callout). The same node `199.193.205.140` handshook
successfully with the real SNI from GURU-5070 on a different network, proving the origin was healthy
and the interference was on the endpoint's path. A blast-radius sweep showed only
`url.emailprotection.link` failed while `google.com`, `microsoft.com`, `outlook.office365.com`,
`cloudflare.com`, `badssl.com`, and even `login.serverdata.net` (same Intermedia infra) all succeeded;
MTU was fine. An SNI-varied handshake to the same IP isolated it conclusively: `example.com` and
`login.serverdata.net` SNIs succeeded 12/12 while `url.emailprotection.link` failed 12/12, across
interleaved source ports — deterministic, SNI-keyed, not flow-hash/LAG. Root cause: a network device
on the path performing SNI-based content inspection that corrupted the handshake for that one hostname.
The gateway turned out to be an ISP-provided Extreme **EXOS** device the client had no login for. The
fix path was therefore ISP escalation. An escalation packet was drafted, and Cloudflare WARP was
installed on the workstation as an interim workaround — tunneling past the SNI block. With WARP
connected, egress moved to Cloudflare (104.28.152.216) and the real-SNI handshake succeeded (TLS 1.2,
HTTP 200). The ISP then disabled a "NetIQ" web/URL-filtering feature on the gateway, which cleared the
block at the source. After WARP was disconnected the native ISP path was verified working (5/5
handshakes + HTTP 200, egress back to 167.89.210.225), so WARP was uninstalled and the machine
returned to normal. Work was documented and billed on Syncro ticket #32437 — private technical note,
customer-facing/emailed summary, and 1.0h remote labor ($150.00, invoice #1650728058).
## Key Decisions
- Diagnosed exclusively via repeated GuruRMM `SslStream`/`Test-NetConnection` probes rather than
asking the client to run tools — faster and reproducible.
- Used an SNI-varied handshake to the *same fixed IP* as the decisive test. It separated
destination/server problems from path interference and proved the block was keyed on the SNI string.
- Ran a 12x repeatability test per SNI to rule out a faulty LAG/ECMP member (intermittent, 5-tuple
keyed) vs deliberate content matching (deterministic). Result was 0/12 vs 12/12 — deterministic.
- Chose Cloudflare WARP as the interim workaround because the block is SNI-based; any tunnel that
encrypts egress past the EXOS hides the SNI. WARP is the lightest deploy.
- Installed/connected WARP in stages (install, then connect, then verify) so each step could be
confirmed before flipping the tunnel, given the agent bounces on network-stack changes.
- Emailed the customer a plain-language summary (do_not_email:false) and kept the technical detail in
a hidden note.
## Problems Encountered
- **Shell state not persisting between Bash calls** — `$TOKEN`/`$RMM` from `rmm-auth.sh` were gone on
the next call (first dispatch produced no output). Fixed by `eval "$(rmm-auth.sh)"` inside every Bash
invocation.
- **PowerShell single-quotes collided with bash single-quoted `SCRIPT='...'`** — embedded
`'C:\Program Files\...'` terminated the bash string (`FilesCloudflareCloudflare: command not found`).
Fixed by defining the script via a quoted heredoc `SCRIPT=$(cat <<'PS' ... PS)`.
- **WARP install/connect bounced the RMM agent** — commands returned `interrupted` ("Agent restarted
during execution") because installing/connecting WARP resets the network stack/WFP. The agent
auto-reconnected (through WARP after connect); verified state with follow-up commands.
- **First WARP uninstall found nothing** — the product registers as **"Cloudflare One Client"**, not
"Cloudflare WARP", so the DisplayName filter missed it. Found the GUID
`{9E49837E-2971-413F-9587-119FA819E572}` and removed via `msiexec /x`.
## Configuration Changes
- **Endpoint DDDOffice072023:** Cloudflare WARP (Cloudflare One Client 2026.4.1390.0) installed,
registered, connected, then later disconnected and **fully uninstalled**. Net change to the machine: none.
- **ISP gateway (not us):** ISP disabled the "NetIQ" web/URL-filtering feature on the Extreme EXOS device.
- No changes to the ClaudeTools repo code.
## Credentials & Secrets
None discovered, created, or rotated this session.
## Infrastructure & Servers
- **Endpoint:** DDDOffice072023 — Windows, GuruRMM agent `000ed57d-fd05-4001-871c-244f43155c16`
(v0.6.66), RMM client "AZ Computer Guru" / site "Discovery test site". LAN gateway 192.168.1.1,
ISP egress 167.89.210.225, WARP egress (while active) 104.28.152.216.
- **ISP gateway:** Extreme Networks **EXOS** device, ISP-provided/managed (client has no login). Was
running a "NetIQ" URL-filtering feature doing SNI inspection.
- **Blocked destination:** `url.emailprotection.link` → CNAME `urlrs.gslb.serverdata.net` → A
199.193.205.140 (Intermedia Email Protection link-rewriter). GSLB pool also advertises
199.193.200.65 / 64.78.20.65 / 162.244.196.65, all TCP-dead from any network (not an ISP block).
- **GuruRMM API:** http://172.16.3.30:3001 (JWT via `rmm-auth.sh`).
## Commands & Outputs
- Decisive SNI test (same IP, varied SNI), via RMM PowerShell:
- SNI `url.emailprotection.link``0x80090326` SEC_E_ILLEGAL_MESSAGE (0/12 ok)
- SNI `example.com` / `login.serverdata.net` → TLS 1.2 AES256 (12/12 ok)
- Off-network control (GURU-5070) to 199.193.205.140 with real SNI → OK TLS 1.2.
- Post-ISP-fix native verify: real-SNI handshake 5/5 OK + `Invoke-WebRequest` HEAD → HTTP 200,
egress 167.89.210.225.
- WARP removal: `msiexec /x {9E49837E-2971-413F-9587-119FA819E572} /qn /norestart` → exit 0;
warp-svc/warp-cli/install-dir/uninstall-entry all gone.
## Pending / Incomplete Tasks
- None functional. Block is resolved at the ISP. If the issue recurs, suspect the EXOS "NetIQ"
feature being re-enabled — re-run the SNI-varied handshake test to confirm.
- Optional: if WARP is ever rolled to more machines as a workaround, harden it (force auto-connect,
lock client) and note egress moves to Cloudflare (breaks office-static-IP allowlisting).
## Reference Information
- **Syncro ticket:** #32437 (id 112766479) — https://computerguru.syncromsp.com/tickets/112766479
- Private note id 419714810; public/emailed summary id 419714813
- Line item id 42925426 (Labor - Remote Business 1.0h @ $150)
- Invoice #1650728058, total $150.00; status Invoiced
- **Customer:** Darrell Delphen (Syncro customer_id 35996725), no prepaid block.
- **SChannel error:** 0x80090326 = SEC_E_ILLEGAL_MESSAGE (handshake message malformed/unexpected) —
signature of in-path TLS/SNI tampering when paired with same-IP success off-network.

View File

@@ -21,6 +21,8 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure ·
2026-06-18 | Howard-Home | rmm | [friction] agent returns exit -1 'Failed to execute command' on a ~7KB multi-line powershell body sent as one command; split into <2KB section scripts and each ran fine [ctx: host=DESKTOP-TRCIEJA agent=0.6.66] 2026-06-18 | Howard-Home | rmm | [friction] agent returns exit -1 'Failed to execute command' on a ~7KB multi-line powershell body sent as one command; split into <2KB section scripts and each ran fine [ctx: host=DESKTOP-TRCIEJA agent=0.6.66]
2026-06-18 | GURU-5070 | coord/ad2-comms | [correction] tried to coordinate with the AD2 session via coord API msg+lock; AD2 is network-isolated (Gitea only, no coord API) so those were no-ops. ALL inter-session comms with AD2 must go via git /sync (committed notes/docs).
2026-06-18 | GURU-5070 | syncro | comment POST piped straight to jq failed with 'jq: parse error: Invalid numeric literal at line 1 col 10' and left it AMBIGUOUS whether the note posted (GET-verify showed it had NOT); per no-retry rule had to GET first, then re-post. Robust pattern that worked: jq -n payload to a file, POST with --data-binary @file, capture response to a file, then GET-verify by subject. Skill's curl|jq comment pattern should adopt this. [ctx: ticket=32441 skill=syncro pattern=curl-pipe-jq] 2026-06-18 | GURU-5070 | syncro | comment POST piped straight to jq failed with 'jq: parse error: Invalid numeric literal at line 1 col 10' and left it AMBIGUOUS whether the note posted (GET-verify showed it had NOT); per no-retry rule had to GET first, then re-post. Robust pattern that worked: jq -n payload to a file, POST with --data-binary @file, capture response to a file, then GET-verify by subject. Skill's curl|jq comment pattern should adopt this. [ctx: ticket=32441 skill=syncro pattern=curl-pipe-jq]
2026-06-18 | GURU-5070 | post-bot-alert | Discord POST failed (non-200/unreachable) [ctx: channel=#bot-alerts http=400 resp={"message": "The request body contains invalid JSON.", "code": 50109}] 2026-06-18 | GURU-5070 | post-bot-alert | Discord POST failed (non-200/unreachable) [ctx: channel=#bot-alerts http=400 resp={"message": "The request body contains invalid JSON.", "code": 50109}]