Files
claudetools/clients/internal-infrastructure/session-logs/2026-04-13-session.md
Mike Swanson 9ab36352ae Session log: Tunnel expansion + WHM fix (ix. grey-cloud)
Audited all 25 proxied zone records and expanded tunnel ingress to cover
9 hostnames total (azcomputerguru + analytics + community + radio +
git + plexrequest + rmm + rmm-api + sync). All verified HTTP 200.

Reverted 3 hostnames to original A records after discovering they
require backend work, not tunnel changes:
- plex/rustdesk: NPM on Jupiter has no vhost for these (returned
  'tls: unrecognized name' when tunneled)
- secure: Jupiter can't route to its backend subnet 172.16.1.0/24

Reverted ix.azcomputerguru.com to DNS-only A record after user
reported :2087 WHM access broken. Cloudflare Tunnel is hostname-bound,
not port-bound, so non-standard admin ports can't pass through. Direct
NAT to 72.194.62.5 restored WHM/cPanel access.

Adds four new helper scripts under clients/internal-infrastructure/
scripts/cloudflared-tunnel-setup/ (audit_proxied, discover_backends,
expand_tunnel, revert_broken). All use SOPS vault / env var for creds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:59:49 -07:00

521 lines
28 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Session Log — Internal Infrastructure — 2026-04-13
## Cloudflare Tunnel deployment for azcomputerguru.com + Cox BGP diagnosis
Earlier 2026-04-13 work (SCMVAS git push, merge conflict resolution) is in
`projects/dataforth-dos/session-logs/2026-04-12-session.md`. This log picks up
when user reported azcomputerguru.com was still showing 521 after the initial
Cloudflare recovery.
---
## Session Summary
User reported azcomputerguru.com returning **521 "Web server is down"** through Cloudflare, despite:
- CF SSL mode being "Full" (not Strict)
- Origin IX server (172.16.3.10) responding 200 OK internally
- Origin reachable from external ISPs (non-CF path)
### What was accomplished
1. **Diagnosed root cause:** Cox ISP has broken BGP routing from our netblock (72.194.62.0/29) to specific Cloudflare IP prefixes. TCP:443 from pfSense WAN succeeds to 104.16/17/26 ranges but **times out** to 162.158.0.0/16, 172.64.0.0/13, 173.245.48.0/20, 141.101.64.0/18. ICMP traceroute to affected prefixes shows ~173ms (cross-country peering) vs ~3.6ms for working prefixes — asymmetric/distant routing. Inbound CF→origin state count was 0 while direct-internet state count was 285, confirming only CF path was broken.
2. **Deployed Cloudflare Tunnel on Jupiter (Unraid)** as a permanent workaround. Tunnel reverses connection direction (outbound from container, using working CF prefixes), eliminating dependency on Cox's broken inbound routing.
3. **Cut over 4 proxied hostnames** to the tunnel via CF DNS API:
- azcomputerguru.com, analytics., community., radio.
- All 4 now return **HTTP 200 OK** through CF edge → tunnel → IX HTTPS vhost (SNI-matched)
4. **Drafted Cox BGP escalation ticket** with evidence (TCP matrix, traceroute comparison, state-table counts). Saved to `vendor-tickets/`.
5. **Folder reorganization:**
- Moved Cox ticket from `projects/dataforth-dos/datasheet-pipeline/implementation/` (wrong — not a Dataforth file) → `clients/internal-infrastructure/vendor-tickets/2026-04-13-cox-bgp-cloudflare-routing.md`
- Merged misnamed `clients/ix-server/` into `clients/internal-infrastructure/` (IX is internal infra, not a client). Session logs moved; folder removed; 4 stale path references updated across 2 files.
### Key decisions & rationale
- **Option C: tunnel on Jupiter Docker** rather than pfSense (cloudflared isn't a pfSense package, firmware upgrades would wipe it) or IX (scoped to IX only; other internal origins would need separate tunnels). Jupiter already runs Unraid with many containers; cloudflared fits the existing pattern. One tunnel can route to any internal LAN IP.
- **HTTPS backend (not HTTP)** with `originServerName: <hostname>` + `noTLSVerify: true`. Initial HTTP backend caused WordPress "force HTTPS" redirect loop on community/radio (they had HSTS/canonical-URL rules IX's other sites lacked).
- **`--user 65532` (container default) with `chown 65532:65532` on host volume** — earlier `--user root` attempt wrote cert to `/root/.cloudflared` (outside bind mount) instead of `/home/nonroot/.cloudflared`.
- **Detached container for `tunnel login`** — earlier foreground attempts got killed when SSH exec_command hit its 9-minute timeout; detached container (`cf-login`) persists independent of SSH.
- **Didn't grey-cloud DNS** (the quick-but-ugly fix); tunnel gives permanent architectural solution that survives future Cox BGP flaps.
### Problems encountered and resolutions
| Problem | Resolution |
|---|---|
| Cloudflare token (Full DNS) lacks Zone Settings + Analytics permissions; couldn't read SSL/TLS mode or per-PoP origin-status | Used pfSense-side diagnostics (TCP probes + traceroute + state table) instead; conclusive without needing Analytics |
| `mkdir: no space left on device` on `/mnt/user/appdata/cloudflared` despite cache showing 181GB free | shfs (Unraid FUSE overlay) was being overly strict near 81% cache usage; bypassed by writing directly to `/mnt/cache/appdata/cloudflared` (raw cache pool, same physical SSD, skips shfs) |
| `cert.pem: permission denied` writing to bind-mount volume | Container runs as UID 65532 (`nonroot`), host dir was owned by `nobody:users` (99:100). Chowned host dir to 65532:65532 before retry |
| `--user root` workaround wrote cert to `/root/.cloudflared`, outside the mount | Dropped `--user` override after fixing host UID ownership |
| Foreground `docker run --rm` for login got killed by SSH exec timeout after 9 min | Used `docker run -d --name cf-login` (detached); container persists through SSH session endings |
| Tailscale was stopped mid-session (user moved to different network); lost all 172.16.x routes | User reconnected to local net; resumed |
| WordPress 301 redirect loop on community/radio after tunnel cutover | Switched tunnel origin from `http://172.16.3.10:80``https://172.16.3.10:443` with `originServerName` per ingress + `noTLSVerify: true` |
| Cox ticket draft initially saved under Dataforth project folder (wrong place) | User flagged; moved to `clients/internal-infrastructure/vendor-tickets/` |
| `clients/ix-server/` existed as a separate folder when IX is internal infra | Merged `clients/ix-server/` (2 session logs) into `clients/internal-infrastructure/session-logs/`, removed empty folder, fixed 4 path references in 2 files |
---
## Credentials
### Cloudflare API tokens (from 1Password)
- **Full DNS token:** `DRRGkHS33pxAUjQfRDzDeVPtt6wwUU6FwtXqOzNj`
- Permissions: Zone:Read, DNS:Read/Edit (confirmed; actual scope narrower than 1Password note implies — lacks Zone Settings, Analytics, Tunnel)
- Token ID: `48607a8ba656e02050e97ae4b1b8fcdf`
- **Legacy token:** `U1UTbBOWA4a69eWEBiqIbYh0etCGzrpTU4XaKp7w`
- Token ID: `162711358e386f178d81bb09ca800148`
- Same limited scope (analytics.read also denied)
- **Account:** `Mike@azcomputerguru.com's Account`, Pro Website plan
- **Zone:** `azcomputerguru.com`, zone ID `1beb9917c22b54be32e5215df2c227ce`
- **Vault entry:** `services/cloudflare.sops.yaml` (contains metadata only — token values are in 1Password, not SOPS vault yet)
### Jupiter (Unraid primary)
- SSH: `root / Th1nk3r^99##` on 172.16.3.20:22
- Vault: `infrastructure/jupiter-unraid-primary.sops.yaml`
- iDRAC: 172.16.1.73, `root / Window123!@#-idrac`
### IX Server (origin)
- SSH: `root / Gptf*77ttb!@#!@#` on 172.16.3.10:22 (internal) / 72.194.62.5 (public)
- OS: CloudLinux 9.7 (RHEL 9 family), WHM/cPanel, Apache
- WHM: port 2087, cPanel: 2083
- Vault: `infrastructure/ix-server.sops.yaml`
### pfSense Firewall
- SSH: `admin / r3tr0gradE99!!` on 172.16.0.1:2248
- OS: pfSense 2.8.1 (FreeBSD 15.0-CURRENT)
- WAN: 98.181.90.163/31, public IP block 72.194.62.2-.10 (all bound to igc0)
- Vault: `infrastructure/pfsense-firewall.sops.yaml`
- Note: no IDS/IPS installed (no suricata/snort/pfBlockerNG), firewalld disabled, 5706 states at time of diag
---
## Infrastructure & Servers
### Tunnel deployment
| Component | Value |
|---|---|
| Tunnel name | `acg-origin` |
| Tunnel UUID | `78d3e58f-1979-4f0e-a28b-98d6b3c3d867` |
| Tunnel target hostname | `78d3e58f-1979-4f0e-a28b-98d6b3c3d867.cfargotunnel.com` |
| Host | Jupiter (172.16.3.20) |
| Docker container name | `cloudflared` (restart=unless-stopped) |
| Docker image | `cloudflare/cloudflared:latest` |
| Host volume | `/mnt/cache/appdata/cloudflared/` (direct cache SSD, chowned 65532:65532) |
| Config file | `/mnt/cache/appdata/cloudflared/config.yml` |
| Cert file | `/mnt/cache/appdata/cloudflared/cert.pem` |
| Credentials file | `/mnt/cache/appdata/cloudflared/78d3e58f-1979-4f0e-a28b-98d6b3c3d867.json` |
| Active CF PoPs | phx01 ×2, lax11 (4 tunnel connections) |
### DNS records updated (all proxied, zone azcomputerguru.com)
| Hostname | Before | After |
|---|---|---|
| azcomputerguru.com | A 72.194.62.5 (not proxied — was a bug; now is) | CNAME `78d3e58f-...cfargotunnel.com` proxied |
| analytics.azcomputerguru.com | A 72.194.62.5 proxied | CNAME `78d3e58f-...cfargotunnel.com` proxied |
| community.azcomputerguru.com | A 72.194.62.5 proxied | CNAME `78d3e58f-...cfargotunnel.com` proxied |
| radio.azcomputerguru.com | A 72.194.62.5 proxied | CNAME `78d3e58f-...cfargotunnel.com` proxied |
Note: `azcomputerguru.com` was `proxied=False` before the cutover (record ID `c865ce7849e3567383433d74e5845f99`). That's odd — it was serving through CF (as evidenced by the 521 responses which only CF serves) but the A record flag was False. Possibly via www CNAME + CF magic. Replaced with a proper proxied CNAME.
### Paths this session
- Local: `D:\claudetools\clients\internal-infrastructure\` (new target after reorg)
- Local (old, removed): `D:\claudetools\clients\ix-server\`
- Local scripts: `D:\claudetools\projects\dataforth-dos\datasheet-pipeline\implementation\jupiter_tunnel_*.py` (should eventually move; they're tunnel-setup helpers, not Dataforth)
- Jupiter: `/mnt/cache/appdata/cloudflared/` (tunnel config/cert)
- IX: No changes persisted (`cloudflared` briefly installed via dnf then removed; `/root/.cloudflared/` deleted)
---
## Commands & Outputs
### Diagnostic cascade (definitive answer)
From pfSense (172.16.0.1):
```
$ for ip in 104.16.0.1 104.17.0.1 104.26.0.1 162.158.0.1 162.158.100.1 172.64.0.1 172.67.0.1 173.245.48.1 141.101.64.1; do
printf "%-16s " $ip; nc -z -v -w 2 $ip 443 2>&1 | head -1
done
104.16.0.1 OK Connection succeeded
104.17.0.1 OK Connection succeeded
104.26.0.1 OK Connection succeeded
162.158.0.1 FAIL Operation timed out
162.158.100.1 FAIL Operation timed out
172.64.0.1 FAIL Operation timed out
172.67.0.1 FAIL Operation timed out
173.245.48.1 FAIL Operation timed out
141.101.64.1 FAIL Operation timed out
$ pfctl -s states | grep "172.16.3.10:443" | wc -l
285 # non-CF users reaching origin fine
$ pfctl -s states | egrep "^[^|]*(104\.(2[6-9])|162\.(158|159)|172\.(64|67))" | head
# 0 results for 162.158.x inbound; 162.159.x outbound-only (initiated from LAN)
```
### Tunnel completion (final state)
```
=== [2] create tunnel acg-origin ===
Created tunnel acg-origin with id 78d3e58f-1979-4f0e-a28b-98d6b3c3d867
=== [4] DNS cutover (A -> CNAME) ===
[azcomputerguru.com] current: type=A content=72.194.62.5 proxied=False id=c865ce7849e3567383433d74e5845f99
[OK] -> CNAME 78d3e58f-1979-4f0e-a28b-98d6b3c3d867.cfargotunnel.com proxied
[analytics.azcomputerguru.com] ... [OK]
[community.azcomputerguru.com] ... [OK]
[radio.azcomputerguru.com] ... [OK]
=== [6] wait for tunnel connections ===
[try 14] connections registered: 4
=== after HTTPS backend switch ===
azcomputerguru.com: HTTP 200 Server=cloudflare
analytics.azcomputerguru.com: HTTP 200 Server=cloudflare
community.azcomputerguru.com: HTTP 200 Server=cloudflare
radio.azcomputerguru.com: HTTP 200 Server=cloudflare
```
### Cloudflare auth URLs issued (4 rounds before success)
Only the final one mattered — fresh container after chown fix:
```
https://dash.cloudflare.com/argotunnel?aud=&callback=https%3A%2F%2Flogin.cloudflareaccess.org%2F7RFAWDCIvWpHtiq0TsoMGEjV9zALX0xwmy1HZssO7mk%3D
```
---
## Configuration Changes
### On Jupiter (172.16.3.20)
**New:** `/mnt/cache/appdata/cloudflared/config.yml`
```yaml
tunnel: 78d3e58f-1979-4f0e-a28b-98d6b3c3d867
credentials-file: /home/nonroot/.cloudflared/78d3e58f-1979-4f0e-a28b-98d6b3c3d867.json
ingress:
- hostname: azcomputerguru.com
service: https://172.16.3.10:443
originRequest:
originServerName: azcomputerguru.com
noTLSVerify: true
- hostname: analytics.azcomputerguru.com
service: https://172.16.3.10:443
originRequest:
originServerName: analytics.azcomputerguru.com
noTLSVerify: true
- hostname: community.azcomputerguru.com
service: https://172.16.3.10:443
originRequest:
originServerName: community.azcomputerguru.com
noTLSVerify: true
- hostname: radio.azcomputerguru.com
service: https://172.16.3.10:443
originRequest:
originServerName: radio.azcomputerguru.com
noTLSVerify: true
- service: http_status:404
```
**New container:** `cloudflared` (auto-restart via `--restart=unless-stopped`). Run command:
```
docker run -d --name cloudflared --restart=unless-stopped \
-v /mnt/cache/appdata/cloudflared:/home/nonroot/.cloudflared \
cloudflare/cloudflared:latest \
tunnel --config /home/nonroot/.cloudflared/config.yml run
```
### Repo reorganization
| Action | From | To |
|---|---|---|
| Moved | `projects/dataforth-dos/datasheet-pipeline/implementation/cox-bgp-ticket-draft.md` | `clients/internal-infrastructure/vendor-tickets/2026-04-13-cox-bgp-cloudflare-routing.md` |
| Moved | `clients/ix-server/session-logs/2026-03-16-ix-account-cleanup.md` | `clients/internal-infrastructure/session-logs/` |
| Moved | `clients/ix-server/session-logs/2026-04-11-smart-slider-security-scan.md` | `clients/internal-infrastructure/session-logs/` |
| Removed | `clients/ix-server/` (empty after moves) | — |
| Edited | `session-logs/2026-04-11-session.md` | 3x `clients/ix-server/``clients/internal-infrastructure/` |
| Edited | `clients/internal-infrastructure/session-logs/2026-04-11-smart-slider-security-scan.md` | 1x path update |
Scripts in `projects/dataforth-dos/datasheet-pipeline/implementation/` relevant to tunnel setup but not yet moved (next session decision):
- `jupiter_tunnel_login5.py`, `jupiter_tunnel_login4.py`, `jupiter_tunnel_login3.py`, `jupiter_tunnel_login2.py`, `jupiter_tunnel_login.py` (multiple login attempts, keep only the detached one)
- `jupiter_tunnel_complete.py` — the one that did the full cutover
- `jupiter_tunnel_fix_https.py` — the HTTPS backend switchover
- `ix_install_cloudflared.py`, `ix_tunnel_login.py` (IX-side, abandoned)
- `cf_analytics.py` — GraphQL probe (showed analytics.read permission missing)
- `pfsense_diag.py`, `pfsense_diag2.py`, `pfsense_trace.py` — the diagnostic cascade
- `cox-bgp-ticket-draft.md` — already moved
---
## Pending / Incomplete / Open Items
### Action items for user
1. **Submit Cox BGP ticket** (file ready at `clients/internal-infrastructure/vendor-tickets/2026-04-13-cox-bgp-cloudflare-routing.md`). Fixing their routing is the permanent root-cause fix; until then the tunnel is the mitigation. No SLA for this.
2. **Populate Cloudflare token in SOPS vault.** Currently `services/cloudflare.sops.yaml` has metadata only — no `credentials:` block. Token values live in 1Password. For pipeline automation it would be nicer to have them in SOPS like everything else:
```
bash D:/vault/scripts/vault.sh edit services/cloudflare.sops.yaml
# add credentials: { api_token_full_dns: DRRGkHS33pxAUjQfRDzDeVPtt6wwUU6FwtXqOzNj, api_token_legacy: U1UTbBOWA4a69eWEBiqIbYh0etCGzrpTU4XaKp7w, dns_zone_id: 1beb9917c22b54be32e5215df2c227ce }
```
3. **Consider expanding tunnel ingress to cover more proxied hostnames** (if Cox BGP stays broken, other proxied hostnames would intermittently 521 too):
- `plex.azcomputerguru.com` → 72.194.62.4 (Jupiter NPM) — could route through tunnel to `https://172.16.3.20:18443` (NPM is already on Jupiter, could bypass public IP entirely)
- `plexrequest.azcomputerguru.com`, `rustdesk.`, `sync.`, `secure.`, `backups.`, `enterpriseenrollment.`, `enterpriseregistration.`, `info.`, `mail.`, `store.`, `ui.` — most are external-proxied CNAMEs, don't need tunnel; a few to Jupiter (.4) could benefit
- Not urgent unless 521 recurs on one of them
4. **Script cleanup** — move tunnel-setup helper scripts out of `projects/dataforth-dos/datasheet-pipeline/implementation/` (wrong project). Candidate targets: `clients/internal-infrastructure/scripts/cloudflared/` or similar. Not touched today.
5. **Commit this work** — the tunnel DNS changes are already live. Local file changes (moves, log, ticket draft) not yet committed.
### Vault hygiene (from earlier today, still pending)
- `clients/dataforth/ad2.sops.yaml`: stale shell-escape backslash in `credentials.password` (stores `Paper123\!@#`; real is `Paper123!@#`).
### Dataforth follow-ups (unrelated to today but still open)
- Verify `C:\Shares\test\scripts\Sync-FromNAS-rsync.ps1` includes the `VASLOG - Engineering Tested` subfolder for ongoing Engineering-tested .txt ingestion.
---
## Reference Information
### Cloudflare Tunnel management
To view logs:
```
ssh root@172.16.3.20 'docker logs cloudflared --tail 30'
```
To list tunnels:
```
docker run --rm -v /mnt/cache/appdata/cloudflared:/home/nonroot/.cloudflared cloudflare/cloudflared:latest tunnel list
```
To restart after config change:
```
docker restart cloudflared
# or stop + start for a fresh container state
```
To rotate the tunnel (delete + recreate):
```
docker run --rm -v /mnt/cache/appdata/cloudflared:/home/nonroot/.cloudflared cloudflare/cloudflared:latest tunnel delete -f acg-origin
# then re-run create + config steps
```
### Cloudflare API one-liners
List DNS records for a hostname:
```
curl -H "Authorization: Bearer $CF_TOKEN" "https://api.cloudflare.com/client/v4/zones/$ZONE/dns_records?name=azcomputerguru.com"
```
Quick site probe:
```
curl -sI -A "Mozilla/5.0 Chrome/120.0" https://azcomputerguru.com/
# Expect: HTTP/1.1 200 OK Server=cloudflare
```
### Useful paths and ports
| Resource | Value |
|---|---|
| Jupiter appdata | `/mnt/cache/appdata/cloudflared/` |
| IX internal | `http://172.16.3.10:80`, `https://172.16.3.10:443` |
| pfSense SSH | `ssh admin@172.16.0.1 -p 2248` |
| Cloudflare API base | `https://api.cloudflare.com/client/v4/zones/1beb9917c22b54be32e5215df2c227ce` |
### Cloudflare-IP prefix status (as of 2026-04-13 ~08:30)
| Prefix | Route via Cox | TCP:443 from pfSense |
|---|---|---|
| 104.16.0.0/13 | local/short path | **OK** |
| 104.24.0.0/14 | local/short path | **OK** |
| 162.158.0.0/16 | distant/broken | **FAIL (timeout)** |
| 172.64.0.0/13 | distant/broken | **FAIL (timeout)** |
| 173.245.48.0/20 | distant/broken | **FAIL (timeout)** |
| 141.101.64.0/18 | distant/broken | **FAIL (timeout)** |
---
## Related Logs
- Earlier today: `projects/dataforth-dos/session-logs/2026-04-12-session.md` (SCMVAS deploy finish + git merge conflict resolution)
- Earlier related: `session-logs/2026-04-06-session.md` (ScreenConnect redirect + UniFi OS VM) — shows public IP block context
- Earlier related: `clients/internal-infrastructure/session-logs/2026-04-11-smart-slider-security-scan.md` (IX WP audit, originally at `clients/ix-server/`)
- Remote (pulled today): commit `499fd5d` "Session log: Gitea recovery (Jupiter cache full)" — explains earlier intermittent Gitea 502s and Jupiter cache pressure seen today
---
**Last Updated:** 2026-04-13
**Next Actions:** submit Cox ticket; consider populating Cloudflare vault entry; monitor tunnel for 24h; cleanup misplaced helper scripts.
---
## Update: 15:56 — Tunnel expansion audit + ix.azcomputerguru.com grey-cloud revert
Post-initial-deploy work to assess which other proxied records in the zone would benefit from the tunnel, then fix a regression on WHM access.
### Work done
1. **Audit of all 25 proxied zone records** (`audit_proxied.py`). Classified each by origin:
- Tunneled (4): azcomputerguru.com, analytics, community, radio
- External SaaS (8): msp360, Microsoft, SendGrid, GoDaddy, etc. — not eligible
- Our-origin not-yet-tunneled (9): ix, git, plex, plexrequest, rmm, rmm-api, sync, rustdesk, secure
- Of those 9, 4 were actively broken (ix=521, plex=525, rustdesk=525, secure=ERR) and 5 working (git/plexrequest/rmm/rmm-api/sync=200)
2. **Mapped NAT rules and NPM backends** (`discover_backends.py`):
- pfSense `pfctl -s nat` shows: `.4`, `.9`, `.10` all rdr to `172.16.3.20:18443` (Jupiter NPM)
- `.5 -> 172.16.3.10:443` (IX Apache)
- `.2 -> 172.16.1.16:443` (different subnet; no route from Jupiter)
- NPM_Server pfSense alias resolves to `172.16.3.20` only (single-member)
- Jupiter NPM active config dir: `/mnt/user/appdata/npm/nginx/proxy_host/` (separate from `NginxProxyManager/` which is a stale v1 copy; there's also an empty `NginxProxyManager-v3/`)
- NPM has proxy_host entries for: emby, plexrequest, unifi, git, rmm-api+rmm, sync, connect
- NPM has **NO** entries for: plex, rustdesk, secure -- so routing them to `https://172.16.3.20:18443` with that Host header returned `tls: unrecognized name` (default cert fallback)
3. **Expanded tunnel to 13 hostnames** (`expand_tunnel.py`) via CF DNS API cutovers, then immediately rolled back 3:
- plex/rustdesk -> cloudflared error `Unable to reach the origin service ... remote error: tls: unrecognized name` (NPM returned default cert because no vhost matched). 502 to users.
- secure -> cloudflared error `no route to host` (Jupiter can't reach 172.16.1.16/24). 502 to users.
- All 3 were already broken BEFORE the tunnel (525/525/ERR). No user-visible regression, but not a *fix* either -- reverted their DNS back to original A records.
4. **Final state after `revert_broken.py`: 10 hostnames tunneled, all HTTP 200**:
- azcomputerguru.com, analytics, community, radio, ix, git, plexrequest, rmm, rmm-api, sync
5. **User reported "IX generated blank screen"** -> root cause: `https://ix.azcomputerguru.com:2087/` is the WHM admin URL. Cloudflare Tunnel is hostname-bound, not port-bound; ingress rules route ALL port traffic (Cloudflare normalizes at edge) to the single backend specified (`https://172.16.3.10:443`). So `:2087` -> landed at Apache:443, not WHM:2087. Apache returned the default vhost redirect instead of WHM.
**Fix: grey-clouded `ix.azcomputerguru.com`** (proxied=False) pointing directly to A `72.194.62.5`. pfSense NAT rules for 2087/2083 are intact and route the traffic to IX. Verified:
- `ix.azcomputerguru.com:443` -> 200 (default vhost redirect, fine)
- `ix.azcomputerguru.com:2087` -> 200 (WHM)
- `ix.azcomputerguru.com:2083` -> 200 (cPanel)
Trade-off: `ix.` no longer benefits from CF's DDoS/caching, but it's admin-only access. If the Cox BGP issue resurfaces specifically for traffic to 72.194.62.5 from certain geographies, `ix.azcomputerguru.com:2087` would fail for users in those regions -- but admin access typically comes from your own network which works fine.
### Key decisions & rationale
- **Tunnel ingress reconfigured to 9 hostnames** (dropped ix. after WHM issue surfaced, kept 3-broken removal from earlier). All 9 serve via tunnel, all verified 200.
- **Grey-cloud (DNS-only) rather than tunnel** for `ix.` because port 2087/2083 admin needs can't be satisfied by the tunnel.
- **Not investigated further**: the 3 unfixable hostnames (plex, rustdesk, secure) -- require NPM vhost additions and/or Jupiter routing changes, beyond today's tunnel scope. Captured as follow-ups.
### Problems encountered and resolutions
| Problem | Resolution |
|---|---|
| plex/rustdesk = 502 (`tls: unrecognized name`) | NPM has no vhost for these hostnames; it returned default cert. Reverted DNS to original A records (no worse than pre-tunnel state). |
| secure = 502 (`no route to host`) | Jupiter (172.16.3.20) can't route to 172.16.1.16 (different subnet). Reverted DNS. |
| WHM blank screen (`:2087`) | Tunnel is hostname-only, can't preserve non-standard ports. Grey-clouded `ix.` so direct NAT handles the admin ports. |
| Tailscale stopped mid-session (again) | User re-enabled after prompt; resumed. |
| Unicode arrow character crashed Python print on Windows cp1252 | Re-ran verify with ASCII chars. Harmless -- DNS/tunnel changes had already succeeded. |
---
## Credentials (unchanged from this session)
Same set as the earlier 2026-04-13 entry above:
- Cloudflare Full DNS token: `DRRGkHS33pxAUjQfRDzDeVPtt6wwUU6FwtXqOzNj`
- Cloudflare Legacy token: `U1UTbBOWA4a69eWEBiqIbYh0etCGzrpTU4XaKp7w`
- Zone ID: `1beb9917c22b54be32e5215df2c227ce`
- Jupiter: `root / Th1nk3r^99##` at 172.16.3.20:22
- IX: `root / Gptf*77ttb!@#!@#` at 172.16.3.10:22 (public 72.194.62.5)
- pfSense: `admin / r3tr0gradE99!!` at 172.16.0.1:2248
---
## DNS changes summary (all of 2026-04-13)
| Hostname | Before session | After session |
|---|---|---|
| azcomputerguru.com | A 72.194.62.5 (mis-configured as proxied=False) | CNAME tunnel proxied |
| analytics.azcomputerguru.com | A 72.194.62.5 proxied | CNAME tunnel proxied |
| community.azcomputerguru.com | A 72.194.62.5 proxied | CNAME tunnel proxied |
| radio.azcomputerguru.com | A 72.194.62.5 proxied | CNAME tunnel proxied |
| ix.azcomputerguru.com | A 72.194.62.5 proxied | **A 72.194.62.5 DNS-only (grey cloud)** (supports :2087/:2083) |
| git.azcomputerguru.com | A 72.194.62.4 proxied | CNAME tunnel proxied |
| plex.azcomputerguru.com | A 72.194.62.4 proxied | A 72.194.62.4 proxied (unchanged net effect) |
| plexrequest.azcomputerguru.com | A 72.194.62.4 proxied | CNAME tunnel proxied |
| rmm.azcomputerguru.com | A 72.194.62.4 proxied | CNAME tunnel proxied |
| rmm-api.azcomputerguru.com | A 72.194.62.4 proxied | CNAME tunnel proxied |
| sync.azcomputerguru.com | A 72.194.62.9 proxied | CNAME tunnel proxied |
| rustdesk.azcomputerguru.com | A 72.194.62.10 proxied | A 72.194.62.10 proxied (unchanged net effect) |
| secure.azcomputerguru.com | A 72.194.62.2 proxied | A 72.194.62.2 proxied (unchanged net effect) |
---
## Current tunnel ingress (9 hostnames -- /mnt/cache/appdata/cloudflared/config.yml)
Tunnel: `78d3e58f-1979-4f0e-a28b-98d6b3c3d867` (name `acg-origin`)
- azcomputerguru.com -> https://172.16.3.10:443 (SNI + noTLSVerify)
- analytics.azcomputerguru.com -> https://172.16.3.10:443
- community.azcomputerguru.com -> https://172.16.3.10:443
- radio.azcomputerguru.com -> https://172.16.3.10:443
- git.azcomputerguru.com -> https://172.16.3.20:18443
- plexrequest.azcomputerguru.com -> https://172.16.3.20:18443
- rmm.azcomputerguru.com -> https://172.16.3.20:18443
- rmm-api.azcomputerguru.com -> https://172.16.3.20:18443
- sync.azcomputerguru.com -> https://172.16.3.20:18443
- catch-all -> http_status:404
Backups of config.yml kept as `config.yml.bak-YYYYMMDD-HHMMSS` in same dir.
---
## Final verification outputs
```
azcomputerguru.com HTTP 200 cloudflare (tunnel -> IX)
analytics.azcomputerguru.com HTTP 200 cloudflare (tunnel -> IX)
community.azcomputerguru.com HTTP 200 cloudflare (tunnel -> IX)
radio.azcomputerguru.com HTTP 200 cloudflare (tunnel -> IX)
git.azcomputerguru.com HTTP 200 cloudflare (tunnel -> Jupiter NPM)
plexrequest.azcomputerguru.com HTTP 200 cloudflare (tunnel -> Jupiter NPM)
rmm.azcomputerguru.com HTTP 200 cloudflare (tunnel -> Jupiter NPM)
rmm-api.azcomputerguru.com HTTP 200 cloudflare (tunnel -> Jupiter NPM)
sync.azcomputerguru.com HTTP 200 cloudflare (tunnel -> Jupiter NPM)
ix.azcomputerguru.com:443 HTTP 200 (direct, default vhost)
ix.azcomputerguru.com:2087 HTTP 200 (direct, WHM)
ix.azcomputerguru.com:2083 HTTP 200 (direct, cPanel)
```
---
## Scripts created (in clients/internal-infrastructure/scripts/cloudflared-tunnel-setup/)
- `audit_proxied.py` -- list all proxied zone records, classify origin, external probe each
- `discover_backends.py` -- extract pfSense NAT rules and Jupiter NPM server_name mappings
- `expand_tunnel.py` -- extend tunnel ingress to 13 hostnames + DNS cutover
- `revert_broken.py` -- remove plex/rustdesk/secure from tunnel and restore their A records
All have been sanitized to use SOPS vault for credentials / env var for CF token.
---
## Pending / Incomplete / Open Items
Additions to the list from the earlier 2026-04-13 entry:
1. **`plex.azcomputerguru.com` is still broken** (525) -- requires NPM proxy_host entry on Jupiter. Likely target: `binhex-plexpass` container at `172.16.3.20:32400` (or whatever internal IP Plex uses with `network_mode: host`). Once NPM has the vhost, can add to tunnel with a single config.yml change.
2. **`rustdesk.azcomputerguru.com` is still broken** (525) -- requires:
- Finding where the rustdesk server is actually running (no `rustdesk` container visible in `docker ps` on Jupiter; may be on a different host, or decommissioned)
- Adding NPM vhost for it
- Then tunnel ingress
3. **`secure.azcomputerguru.com` is still broken** (ERR) -- requires either:
- A static route on Jupiter to 172.16.1.0/24 so cloudflared can reach 172.16.1.16
- Or move the service behind Jupiter NPM
- Or grey-cloud to DNS-only like we did for `ix.` (bypass CF entirely)
4. **Still TODO from the earlier block:**
- Submit Cox BGP ticket (`clients/internal-infrastructure/vendor-tickets/2026-04-13-cox-bgp-cloudflare-routing.md`)
- Populate CF tokens in SOPS vault (currently 1Password only)
- Fix stale `Paper123\!@#` in Dataforth AD2 vault entry
- Verify rsync covers Dataforth `VASLOG - Engineering Tested` subfolder
---
**Last Updated:** 2026-04-13 15:56
**Next Actions:** consider adding NPM vhost for plex, investigate rustdesk host, commit today's additions.