Audited all 25 proxied zone records and expanded tunnel ingress to cover 9 hostnames total (azcomputerguru + analytics + community + radio + git + plexrequest + rmm + rmm-api + sync). All verified HTTP 200. Reverted 3 hostnames to original A records after discovering they require backend work, not tunnel changes: - plex/rustdesk: NPM on Jupiter has no vhost for these (returned 'tls: unrecognized name' when tunneled) - secure: Jupiter can't route to its backend subnet 172.16.1.0/24 Reverted ix.azcomputerguru.com to DNS-only A record after user reported :2087 WHM access broken. Cloudflare Tunnel is hostname-bound, not port-bound, so non-standard admin ports can't pass through. Direct NAT to 72.194.62.5 restored WHM/cPanel access. Adds four new helper scripts under clients/internal-infrastructure/ scripts/cloudflared-tunnel-setup/ (audit_proxied, discover_backends, expand_tunnel, revert_broken). All use SOPS vault / env var for creds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
521 lines
28 KiB
Markdown
521 lines
28 KiB
Markdown
# Session Log — Internal Infrastructure — 2026-04-13
|
||
|
||
## Cloudflare Tunnel deployment for azcomputerguru.com + Cox BGP diagnosis
|
||
|
||
Earlier 2026-04-13 work (SCMVAS git push, merge conflict resolution) is in
|
||
`projects/dataforth-dos/session-logs/2026-04-12-session.md`. This log picks up
|
||
when user reported azcomputerguru.com was still showing 521 after the initial
|
||
Cloudflare recovery.
|
||
|
||
---
|
||
|
||
## Session Summary
|
||
|
||
User reported azcomputerguru.com returning **521 "Web server is down"** through Cloudflare, despite:
|
||
- CF SSL mode being "Full" (not Strict)
|
||
- Origin IX server (172.16.3.10) responding 200 OK internally
|
||
- Origin reachable from external ISPs (non-CF path)
|
||
|
||
### What was accomplished
|
||
|
||
1. **Diagnosed root cause:** Cox ISP has broken BGP routing from our netblock (72.194.62.0/29) to specific Cloudflare IP prefixes. TCP:443 from pfSense WAN succeeds to 104.16/17/26 ranges but **times out** to 162.158.0.0/16, 172.64.0.0/13, 173.245.48.0/20, 141.101.64.0/18. ICMP traceroute to affected prefixes shows ~173ms (cross-country peering) vs ~3.6ms for working prefixes — asymmetric/distant routing. Inbound CF→origin state count was 0 while direct-internet state count was 285, confirming only CF path was broken.
|
||
|
||
2. **Deployed Cloudflare Tunnel on Jupiter (Unraid)** as a permanent workaround. Tunnel reverses connection direction (outbound from container, using working CF prefixes), eliminating dependency on Cox's broken inbound routing.
|
||
|
||
3. **Cut over 4 proxied hostnames** to the tunnel via CF DNS API:
|
||
- azcomputerguru.com, analytics., community., radio.
|
||
- All 4 now return **HTTP 200 OK** through CF edge → tunnel → IX HTTPS vhost (SNI-matched)
|
||
|
||
4. **Drafted Cox BGP escalation ticket** with evidence (TCP matrix, traceroute comparison, state-table counts). Saved to `vendor-tickets/`.
|
||
|
||
5. **Folder reorganization:**
|
||
- Moved Cox ticket from `projects/dataforth-dos/datasheet-pipeline/implementation/` (wrong — not a Dataforth file) → `clients/internal-infrastructure/vendor-tickets/2026-04-13-cox-bgp-cloudflare-routing.md`
|
||
- Merged misnamed `clients/ix-server/` into `clients/internal-infrastructure/` (IX is internal infra, not a client). Session logs moved; folder removed; 4 stale path references updated across 2 files.
|
||
|
||
### Key decisions & rationale
|
||
|
||
- **Option C: tunnel on Jupiter Docker** rather than pfSense (cloudflared isn't a pfSense package, firmware upgrades would wipe it) or IX (scoped to IX only; other internal origins would need separate tunnels). Jupiter already runs Unraid with many containers; cloudflared fits the existing pattern. One tunnel can route to any internal LAN IP.
|
||
- **HTTPS backend (not HTTP)** with `originServerName: <hostname>` + `noTLSVerify: true`. Initial HTTP backend caused WordPress "force HTTPS" redirect loop on community/radio (they had HSTS/canonical-URL rules IX's other sites lacked).
|
||
- **`--user 65532` (container default) with `chown 65532:65532` on host volume** — earlier `--user root` attempt wrote cert to `/root/.cloudflared` (outside bind mount) instead of `/home/nonroot/.cloudflared`.
|
||
- **Detached container for `tunnel login`** — earlier foreground attempts got killed when SSH exec_command hit its 9-minute timeout; detached container (`cf-login`) persists independent of SSH.
|
||
- **Didn't grey-cloud DNS** (the quick-but-ugly fix); tunnel gives permanent architectural solution that survives future Cox BGP flaps.
|
||
|
||
### Problems encountered and resolutions
|
||
|
||
| Problem | Resolution |
|
||
|---|---|
|
||
| Cloudflare token (Full DNS) lacks Zone Settings + Analytics permissions; couldn't read SSL/TLS mode or per-PoP origin-status | Used pfSense-side diagnostics (TCP probes + traceroute + state table) instead; conclusive without needing Analytics |
|
||
| `mkdir: no space left on device` on `/mnt/user/appdata/cloudflared` despite cache showing 181GB free | shfs (Unraid FUSE overlay) was being overly strict near 81% cache usage; bypassed by writing directly to `/mnt/cache/appdata/cloudflared` (raw cache pool, same physical SSD, skips shfs) |
|
||
| `cert.pem: permission denied` writing to bind-mount volume | Container runs as UID 65532 (`nonroot`), host dir was owned by `nobody:users` (99:100). Chowned host dir to 65532:65532 before retry |
|
||
| `--user root` workaround wrote cert to `/root/.cloudflared`, outside the mount | Dropped `--user` override after fixing host UID ownership |
|
||
| Foreground `docker run --rm` for login got killed by SSH exec timeout after 9 min | Used `docker run -d --name cf-login` (detached); container persists through SSH session endings |
|
||
| Tailscale was stopped mid-session (user moved to different network); lost all 172.16.x routes | User reconnected to local net; resumed |
|
||
| WordPress 301 redirect loop on community/radio after tunnel cutover | Switched tunnel origin from `http://172.16.3.10:80` → `https://172.16.3.10:443` with `originServerName` per ingress + `noTLSVerify: true` |
|
||
| Cox ticket draft initially saved under Dataforth project folder (wrong place) | User flagged; moved to `clients/internal-infrastructure/vendor-tickets/` |
|
||
| `clients/ix-server/` existed as a separate folder when IX is internal infra | Merged `clients/ix-server/` (2 session logs) into `clients/internal-infrastructure/session-logs/`, removed empty folder, fixed 4 path references in 2 files |
|
||
|
||
---
|
||
|
||
## Credentials
|
||
|
||
### Cloudflare API tokens (from 1Password)
|
||
- **Full DNS token:** `DRRGkHS33pxAUjQfRDzDeVPtt6wwUU6FwtXqOzNj`
|
||
- Permissions: Zone:Read, DNS:Read/Edit (confirmed; actual scope narrower than 1Password note implies — lacks Zone Settings, Analytics, Tunnel)
|
||
- Token ID: `48607a8ba656e02050e97ae4b1b8fcdf`
|
||
- **Legacy token:** `U1UTbBOWA4a69eWEBiqIbYh0etCGzrpTU4XaKp7w`
|
||
- Token ID: `162711358e386f178d81bb09ca800148`
|
||
- Same limited scope (analytics.read also denied)
|
||
- **Account:** `Mike@azcomputerguru.com's Account`, Pro Website plan
|
||
- **Zone:** `azcomputerguru.com`, zone ID `1beb9917c22b54be32e5215df2c227ce`
|
||
- **Vault entry:** `services/cloudflare.sops.yaml` (contains metadata only — token values are in 1Password, not SOPS vault yet)
|
||
|
||
### Jupiter (Unraid primary)
|
||
- SSH: `root / Th1nk3r^99##` on 172.16.3.20:22
|
||
- Vault: `infrastructure/jupiter-unraid-primary.sops.yaml`
|
||
- iDRAC: 172.16.1.73, `root / Window123!@#-idrac`
|
||
|
||
### IX Server (origin)
|
||
- SSH: `root / Gptf*77ttb!@#!@#` on 172.16.3.10:22 (internal) / 72.194.62.5 (public)
|
||
- OS: CloudLinux 9.7 (RHEL 9 family), WHM/cPanel, Apache
|
||
- WHM: port 2087, cPanel: 2083
|
||
- Vault: `infrastructure/ix-server.sops.yaml`
|
||
|
||
### pfSense Firewall
|
||
- SSH: `admin / r3tr0gradE99!!` on 172.16.0.1:2248
|
||
- OS: pfSense 2.8.1 (FreeBSD 15.0-CURRENT)
|
||
- WAN: 98.181.90.163/31, public IP block 72.194.62.2-.10 (all bound to igc0)
|
||
- Vault: `infrastructure/pfsense-firewall.sops.yaml`
|
||
- Note: no IDS/IPS installed (no suricata/snort/pfBlockerNG), firewalld disabled, 5706 states at time of diag
|
||
|
||
---
|
||
|
||
## Infrastructure & Servers
|
||
|
||
### Tunnel deployment
|
||
|
||
| Component | Value |
|
||
|---|---|
|
||
| Tunnel name | `acg-origin` |
|
||
| Tunnel UUID | `78d3e58f-1979-4f0e-a28b-98d6b3c3d867` |
|
||
| Tunnel target hostname | `78d3e58f-1979-4f0e-a28b-98d6b3c3d867.cfargotunnel.com` |
|
||
| Host | Jupiter (172.16.3.20) |
|
||
| Docker container name | `cloudflared` (restart=unless-stopped) |
|
||
| Docker image | `cloudflare/cloudflared:latest` |
|
||
| Host volume | `/mnt/cache/appdata/cloudflared/` (direct cache SSD, chowned 65532:65532) |
|
||
| Config file | `/mnt/cache/appdata/cloudflared/config.yml` |
|
||
| Cert file | `/mnt/cache/appdata/cloudflared/cert.pem` |
|
||
| Credentials file | `/mnt/cache/appdata/cloudflared/78d3e58f-1979-4f0e-a28b-98d6b3c3d867.json` |
|
||
| Active CF PoPs | phx01 ×2, lax11 (4 tunnel connections) |
|
||
|
||
### DNS records updated (all proxied, zone azcomputerguru.com)
|
||
|
||
| Hostname | Before | After |
|
||
|---|---|---|
|
||
| azcomputerguru.com | A 72.194.62.5 (not proxied — was a bug; now is) | CNAME `78d3e58f-...cfargotunnel.com` proxied |
|
||
| analytics.azcomputerguru.com | A 72.194.62.5 proxied | CNAME `78d3e58f-...cfargotunnel.com` proxied |
|
||
| community.azcomputerguru.com | A 72.194.62.5 proxied | CNAME `78d3e58f-...cfargotunnel.com` proxied |
|
||
| radio.azcomputerguru.com | A 72.194.62.5 proxied | CNAME `78d3e58f-...cfargotunnel.com` proxied |
|
||
|
||
Note: `azcomputerguru.com` was `proxied=False` before the cutover (record ID `c865ce7849e3567383433d74e5845f99`). That's odd — it was serving through CF (as evidenced by the 521 responses which only CF serves) but the A record flag was False. Possibly via www CNAME + CF magic. Replaced with a proper proxied CNAME.
|
||
|
||
### Paths this session
|
||
|
||
- Local: `D:\claudetools\clients\internal-infrastructure\` (new target after reorg)
|
||
- Local (old, removed): `D:\claudetools\clients\ix-server\`
|
||
- Local scripts: `D:\claudetools\projects\dataforth-dos\datasheet-pipeline\implementation\jupiter_tunnel_*.py` (should eventually move; they're tunnel-setup helpers, not Dataforth)
|
||
- Jupiter: `/mnt/cache/appdata/cloudflared/` (tunnel config/cert)
|
||
- IX: No changes persisted (`cloudflared` briefly installed via dnf then removed; `/root/.cloudflared/` deleted)
|
||
|
||
---
|
||
|
||
## Commands & Outputs
|
||
|
||
### Diagnostic cascade (definitive answer)
|
||
|
||
From pfSense (172.16.0.1):
|
||
```
|
||
$ for ip in 104.16.0.1 104.17.0.1 104.26.0.1 162.158.0.1 162.158.100.1 172.64.0.1 172.67.0.1 173.245.48.1 141.101.64.1; do
|
||
printf "%-16s " $ip; nc -z -v -w 2 $ip 443 2>&1 | head -1
|
||
done
|
||
104.16.0.1 OK Connection succeeded
|
||
104.17.0.1 OK Connection succeeded
|
||
104.26.0.1 OK Connection succeeded
|
||
162.158.0.1 FAIL Operation timed out
|
||
162.158.100.1 FAIL Operation timed out
|
||
172.64.0.1 FAIL Operation timed out
|
||
172.67.0.1 FAIL Operation timed out
|
||
173.245.48.1 FAIL Operation timed out
|
||
141.101.64.1 FAIL Operation timed out
|
||
|
||
$ pfctl -s states | grep "172.16.3.10:443" | wc -l
|
||
285 # non-CF users reaching origin fine
|
||
|
||
$ pfctl -s states | egrep "^[^|]*(104\.(2[6-9])|162\.(158|159)|172\.(64|67))" | head
|
||
# 0 results for 162.158.x inbound; 162.159.x outbound-only (initiated from LAN)
|
||
```
|
||
|
||
### Tunnel completion (final state)
|
||
|
||
```
|
||
=== [2] create tunnel acg-origin ===
|
||
Created tunnel acg-origin with id 78d3e58f-1979-4f0e-a28b-98d6b3c3d867
|
||
|
||
=== [4] DNS cutover (A -> CNAME) ===
|
||
[azcomputerguru.com] current: type=A content=72.194.62.5 proxied=False id=c865ce7849e3567383433d74e5845f99
|
||
[OK] -> CNAME 78d3e58f-1979-4f0e-a28b-98d6b3c3d867.cfargotunnel.com proxied
|
||
[analytics.azcomputerguru.com] ... [OK]
|
||
[community.azcomputerguru.com] ... [OK]
|
||
[radio.azcomputerguru.com] ... [OK]
|
||
|
||
=== [6] wait for tunnel connections ===
|
||
[try 14] connections registered: 4
|
||
|
||
=== after HTTPS backend switch ===
|
||
azcomputerguru.com: HTTP 200 Server=cloudflare
|
||
analytics.azcomputerguru.com: HTTP 200 Server=cloudflare
|
||
community.azcomputerguru.com: HTTP 200 Server=cloudflare
|
||
radio.azcomputerguru.com: HTTP 200 Server=cloudflare
|
||
```
|
||
|
||
### Cloudflare auth URLs issued (4 rounds before success)
|
||
|
||
Only the final one mattered — fresh container after chown fix:
|
||
```
|
||
https://dash.cloudflare.com/argotunnel?aud=&callback=https%3A%2F%2Flogin.cloudflareaccess.org%2F7RFAWDCIvWpHtiq0TsoMGEjV9zALX0xwmy1HZssO7mk%3D
|
||
```
|
||
|
||
---
|
||
|
||
## Configuration Changes
|
||
|
||
### On Jupiter (172.16.3.20)
|
||
|
||
**New:** `/mnt/cache/appdata/cloudflared/config.yml`
|
||
```yaml
|
||
tunnel: 78d3e58f-1979-4f0e-a28b-98d6b3c3d867
|
||
credentials-file: /home/nonroot/.cloudflared/78d3e58f-1979-4f0e-a28b-98d6b3c3d867.json
|
||
ingress:
|
||
- hostname: azcomputerguru.com
|
||
service: https://172.16.3.10:443
|
||
originRequest:
|
||
originServerName: azcomputerguru.com
|
||
noTLSVerify: true
|
||
- hostname: analytics.azcomputerguru.com
|
||
service: https://172.16.3.10:443
|
||
originRequest:
|
||
originServerName: analytics.azcomputerguru.com
|
||
noTLSVerify: true
|
||
- hostname: community.azcomputerguru.com
|
||
service: https://172.16.3.10:443
|
||
originRequest:
|
||
originServerName: community.azcomputerguru.com
|
||
noTLSVerify: true
|
||
- hostname: radio.azcomputerguru.com
|
||
service: https://172.16.3.10:443
|
||
originRequest:
|
||
originServerName: radio.azcomputerguru.com
|
||
noTLSVerify: true
|
||
- service: http_status:404
|
||
```
|
||
|
||
**New container:** `cloudflared` (auto-restart via `--restart=unless-stopped`). Run command:
|
||
```
|
||
docker run -d --name cloudflared --restart=unless-stopped \
|
||
-v /mnt/cache/appdata/cloudflared:/home/nonroot/.cloudflared \
|
||
cloudflare/cloudflared:latest \
|
||
tunnel --config /home/nonroot/.cloudflared/config.yml run
|
||
```
|
||
|
||
### Repo reorganization
|
||
|
||
| Action | From | To |
|
||
|---|---|---|
|
||
| Moved | `projects/dataforth-dos/datasheet-pipeline/implementation/cox-bgp-ticket-draft.md` | `clients/internal-infrastructure/vendor-tickets/2026-04-13-cox-bgp-cloudflare-routing.md` |
|
||
| Moved | `clients/ix-server/session-logs/2026-03-16-ix-account-cleanup.md` | `clients/internal-infrastructure/session-logs/` |
|
||
| Moved | `clients/ix-server/session-logs/2026-04-11-smart-slider-security-scan.md` | `clients/internal-infrastructure/session-logs/` |
|
||
| Removed | `clients/ix-server/` (empty after moves) | — |
|
||
| Edited | `session-logs/2026-04-11-session.md` | 3x `clients/ix-server/` → `clients/internal-infrastructure/` |
|
||
| Edited | `clients/internal-infrastructure/session-logs/2026-04-11-smart-slider-security-scan.md` | 1x path update |
|
||
|
||
Scripts in `projects/dataforth-dos/datasheet-pipeline/implementation/` relevant to tunnel setup but not yet moved (next session decision):
|
||
- `jupiter_tunnel_login5.py`, `jupiter_tunnel_login4.py`, `jupiter_tunnel_login3.py`, `jupiter_tunnel_login2.py`, `jupiter_tunnel_login.py` (multiple login attempts, keep only the detached one)
|
||
- `jupiter_tunnel_complete.py` — the one that did the full cutover
|
||
- `jupiter_tunnel_fix_https.py` — the HTTPS backend switchover
|
||
- `ix_install_cloudflared.py`, `ix_tunnel_login.py` (IX-side, abandoned)
|
||
- `cf_analytics.py` — GraphQL probe (showed analytics.read permission missing)
|
||
- `pfsense_diag.py`, `pfsense_diag2.py`, `pfsense_trace.py` — the diagnostic cascade
|
||
- `cox-bgp-ticket-draft.md` — already moved
|
||
|
||
---
|
||
|
||
## Pending / Incomplete / Open Items
|
||
|
||
### Action items for user
|
||
|
||
1. **Submit Cox BGP ticket** (file ready at `clients/internal-infrastructure/vendor-tickets/2026-04-13-cox-bgp-cloudflare-routing.md`). Fixing their routing is the permanent root-cause fix; until then the tunnel is the mitigation. No SLA for this.
|
||
|
||
2. **Populate Cloudflare token in SOPS vault.** Currently `services/cloudflare.sops.yaml` has metadata only — no `credentials:` block. Token values live in 1Password. For pipeline automation it would be nicer to have them in SOPS like everything else:
|
||
```
|
||
bash D:/vault/scripts/vault.sh edit services/cloudflare.sops.yaml
|
||
# add credentials: { api_token_full_dns: DRRGkHS33pxAUjQfRDzDeVPtt6wwUU6FwtXqOzNj, api_token_legacy: U1UTbBOWA4a69eWEBiqIbYh0etCGzrpTU4XaKp7w, dns_zone_id: 1beb9917c22b54be32e5215df2c227ce }
|
||
```
|
||
|
||
3. **Consider expanding tunnel ingress to cover more proxied hostnames** (if Cox BGP stays broken, other proxied hostnames would intermittently 521 too):
|
||
- `plex.azcomputerguru.com` → 72.194.62.4 (Jupiter NPM) — could route through tunnel to `https://172.16.3.20:18443` (NPM is already on Jupiter, could bypass public IP entirely)
|
||
- `plexrequest.azcomputerguru.com`, `rustdesk.`, `sync.`, `secure.`, `backups.`, `enterpriseenrollment.`, `enterpriseregistration.`, `info.`, `mail.`, `store.`, `ui.` — most are external-proxied CNAMEs, don't need tunnel; a few to Jupiter (.4) could benefit
|
||
- Not urgent unless 521 recurs on one of them
|
||
|
||
4. **Script cleanup** — move tunnel-setup helper scripts out of `projects/dataforth-dos/datasheet-pipeline/implementation/` (wrong project). Candidate targets: `clients/internal-infrastructure/scripts/cloudflared/` or similar. Not touched today.
|
||
|
||
5. **Commit this work** — the tunnel DNS changes are already live. Local file changes (moves, log, ticket draft) not yet committed.
|
||
|
||
### Vault hygiene (from earlier today, still pending)
|
||
|
||
- `clients/dataforth/ad2.sops.yaml`: stale shell-escape backslash in `credentials.password` (stores `Paper123\!@#`; real is `Paper123!@#`).
|
||
|
||
### Dataforth follow-ups (unrelated to today but still open)
|
||
|
||
- Verify `C:\Shares\test\scripts\Sync-FromNAS-rsync.ps1` includes the `VASLOG - Engineering Tested` subfolder for ongoing Engineering-tested .txt ingestion.
|
||
|
||
---
|
||
|
||
## Reference Information
|
||
|
||
### Cloudflare Tunnel management
|
||
|
||
To view logs:
|
||
```
|
||
ssh root@172.16.3.20 'docker logs cloudflared --tail 30'
|
||
```
|
||
|
||
To list tunnels:
|
||
```
|
||
docker run --rm -v /mnt/cache/appdata/cloudflared:/home/nonroot/.cloudflared cloudflare/cloudflared:latest tunnel list
|
||
```
|
||
|
||
To restart after config change:
|
||
```
|
||
docker restart cloudflared
|
||
# or stop + start for a fresh container state
|
||
```
|
||
|
||
To rotate the tunnel (delete + recreate):
|
||
```
|
||
docker run --rm -v /mnt/cache/appdata/cloudflared:/home/nonroot/.cloudflared cloudflare/cloudflared:latest tunnel delete -f acg-origin
|
||
# then re-run create + config steps
|
||
```
|
||
|
||
### Cloudflare API one-liners
|
||
|
||
List DNS records for a hostname:
|
||
```
|
||
curl -H "Authorization: Bearer $CF_TOKEN" "https://api.cloudflare.com/client/v4/zones/$ZONE/dns_records?name=azcomputerguru.com"
|
||
```
|
||
|
||
Quick site probe:
|
||
```
|
||
curl -sI -A "Mozilla/5.0 Chrome/120.0" https://azcomputerguru.com/
|
||
# Expect: HTTP/1.1 200 OK Server=cloudflare
|
||
```
|
||
|
||
### Useful paths and ports
|
||
|
||
| Resource | Value |
|
||
|---|---|
|
||
| Jupiter appdata | `/mnt/cache/appdata/cloudflared/` |
|
||
| IX internal | `http://172.16.3.10:80`, `https://172.16.3.10:443` |
|
||
| pfSense SSH | `ssh admin@172.16.0.1 -p 2248` |
|
||
| Cloudflare API base | `https://api.cloudflare.com/client/v4/zones/1beb9917c22b54be32e5215df2c227ce` |
|
||
|
||
### Cloudflare-IP prefix status (as of 2026-04-13 ~08:30)
|
||
|
||
| Prefix | Route via Cox | TCP:443 from pfSense |
|
||
|---|---|---|
|
||
| 104.16.0.0/13 | local/short path | **OK** |
|
||
| 104.24.0.0/14 | local/short path | **OK** |
|
||
| 162.158.0.0/16 | distant/broken | **FAIL (timeout)** |
|
||
| 172.64.0.0/13 | distant/broken | **FAIL (timeout)** |
|
||
| 173.245.48.0/20 | distant/broken | **FAIL (timeout)** |
|
||
| 141.101.64.0/18 | distant/broken | **FAIL (timeout)** |
|
||
|
||
---
|
||
|
||
## Related Logs
|
||
|
||
- Earlier today: `projects/dataforth-dos/session-logs/2026-04-12-session.md` (SCMVAS deploy finish + git merge conflict resolution)
|
||
- Earlier related: `session-logs/2026-04-06-session.md` (ScreenConnect redirect + UniFi OS VM) — shows public IP block context
|
||
- Earlier related: `clients/internal-infrastructure/session-logs/2026-04-11-smart-slider-security-scan.md` (IX WP audit, originally at `clients/ix-server/`)
|
||
- Remote (pulled today): commit `499fd5d` "Session log: Gitea recovery (Jupiter cache full)" — explains earlier intermittent Gitea 502s and Jupiter cache pressure seen today
|
||
|
||
---
|
||
|
||
**Last Updated:** 2026-04-13
|
||
**Next Actions:** submit Cox ticket; consider populating Cloudflare vault entry; monitor tunnel for 24h; cleanup misplaced helper scripts.
|
||
|
||
---
|
||
|
||
## Update: 15:56 — Tunnel expansion audit + ix.azcomputerguru.com grey-cloud revert
|
||
|
||
Post-initial-deploy work to assess which other proxied records in the zone would benefit from the tunnel, then fix a regression on WHM access.
|
||
|
||
### Work done
|
||
|
||
1. **Audit of all 25 proxied zone records** (`audit_proxied.py`). Classified each by origin:
|
||
- Tunneled (4): azcomputerguru.com, analytics, community, radio
|
||
- External SaaS (8): msp360, Microsoft, SendGrid, GoDaddy, etc. — not eligible
|
||
- Our-origin not-yet-tunneled (9): ix, git, plex, plexrequest, rmm, rmm-api, sync, rustdesk, secure
|
||
- Of those 9, 4 were actively broken (ix=521, plex=525, rustdesk=525, secure=ERR) and 5 working (git/plexrequest/rmm/rmm-api/sync=200)
|
||
|
||
2. **Mapped NAT rules and NPM backends** (`discover_backends.py`):
|
||
- pfSense `pfctl -s nat` shows: `.4`, `.9`, `.10` all rdr to `172.16.3.20:18443` (Jupiter NPM)
|
||
- `.5 -> 172.16.3.10:443` (IX Apache)
|
||
- `.2 -> 172.16.1.16:443` (different subnet; no route from Jupiter)
|
||
- NPM_Server pfSense alias resolves to `172.16.3.20` only (single-member)
|
||
- Jupiter NPM active config dir: `/mnt/user/appdata/npm/nginx/proxy_host/` (separate from `NginxProxyManager/` which is a stale v1 copy; there's also an empty `NginxProxyManager-v3/`)
|
||
- NPM has proxy_host entries for: emby, plexrequest, unifi, git, rmm-api+rmm, sync, connect
|
||
- NPM has **NO** entries for: plex, rustdesk, secure -- so routing them to `https://172.16.3.20:18443` with that Host header returned `tls: unrecognized name` (default cert fallback)
|
||
|
||
3. **Expanded tunnel to 13 hostnames** (`expand_tunnel.py`) via CF DNS API cutovers, then immediately rolled back 3:
|
||
- plex/rustdesk -> cloudflared error `Unable to reach the origin service ... remote error: tls: unrecognized name` (NPM returned default cert because no vhost matched). 502 to users.
|
||
- secure -> cloudflared error `no route to host` (Jupiter can't reach 172.16.1.16/24). 502 to users.
|
||
- All 3 were already broken BEFORE the tunnel (525/525/ERR). No user-visible regression, but not a *fix* either -- reverted their DNS back to original A records.
|
||
|
||
4. **Final state after `revert_broken.py`: 10 hostnames tunneled, all HTTP 200**:
|
||
- azcomputerguru.com, analytics, community, radio, ix, git, plexrequest, rmm, rmm-api, sync
|
||
|
||
5. **User reported "IX generated blank screen"** -> root cause: `https://ix.azcomputerguru.com:2087/` is the WHM admin URL. Cloudflare Tunnel is hostname-bound, not port-bound; ingress rules route ALL port traffic (Cloudflare normalizes at edge) to the single backend specified (`https://172.16.3.10:443`). So `:2087` -> landed at Apache:443, not WHM:2087. Apache returned the default vhost redirect instead of WHM.
|
||
|
||
**Fix: grey-clouded `ix.azcomputerguru.com`** (proxied=False) pointing directly to A `72.194.62.5`. pfSense NAT rules for 2087/2083 are intact and route the traffic to IX. Verified:
|
||
- `ix.azcomputerguru.com:443` -> 200 (default vhost redirect, fine)
|
||
- `ix.azcomputerguru.com:2087` -> 200 (WHM)
|
||
- `ix.azcomputerguru.com:2083` -> 200 (cPanel)
|
||
|
||
Trade-off: `ix.` no longer benefits from CF's DDoS/caching, but it's admin-only access. If the Cox BGP issue resurfaces specifically for traffic to 72.194.62.5 from certain geographies, `ix.azcomputerguru.com:2087` would fail for users in those regions -- but admin access typically comes from your own network which works fine.
|
||
|
||
### Key decisions & rationale
|
||
|
||
- **Tunnel ingress reconfigured to 9 hostnames** (dropped ix. after WHM issue surfaced, kept 3-broken removal from earlier). All 9 serve via tunnel, all verified 200.
|
||
- **Grey-cloud (DNS-only) rather than tunnel** for `ix.` because port 2087/2083 admin needs can't be satisfied by the tunnel.
|
||
- **Not investigated further**: the 3 unfixable hostnames (plex, rustdesk, secure) -- require NPM vhost additions and/or Jupiter routing changes, beyond today's tunnel scope. Captured as follow-ups.
|
||
|
||
### Problems encountered and resolutions
|
||
|
||
| Problem | Resolution |
|
||
|---|---|
|
||
| plex/rustdesk = 502 (`tls: unrecognized name`) | NPM has no vhost for these hostnames; it returned default cert. Reverted DNS to original A records (no worse than pre-tunnel state). |
|
||
| secure = 502 (`no route to host`) | Jupiter (172.16.3.20) can't route to 172.16.1.16 (different subnet). Reverted DNS. |
|
||
| WHM blank screen (`:2087`) | Tunnel is hostname-only, can't preserve non-standard ports. Grey-clouded `ix.` so direct NAT handles the admin ports. |
|
||
| Tailscale stopped mid-session (again) | User re-enabled after prompt; resumed. |
|
||
| Unicode arrow character crashed Python print on Windows cp1252 | Re-ran verify with ASCII chars. Harmless -- DNS/tunnel changes had already succeeded. |
|
||
|
||
---
|
||
|
||
## Credentials (unchanged from this session)
|
||
|
||
Same set as the earlier 2026-04-13 entry above:
|
||
- Cloudflare Full DNS token: `DRRGkHS33pxAUjQfRDzDeVPtt6wwUU6FwtXqOzNj`
|
||
- Cloudflare Legacy token: `U1UTbBOWA4a69eWEBiqIbYh0etCGzrpTU4XaKp7w`
|
||
- Zone ID: `1beb9917c22b54be32e5215df2c227ce`
|
||
- Jupiter: `root / Th1nk3r^99##` at 172.16.3.20:22
|
||
- IX: `root / Gptf*77ttb!@#!@#` at 172.16.3.10:22 (public 72.194.62.5)
|
||
- pfSense: `admin / r3tr0gradE99!!` at 172.16.0.1:2248
|
||
|
||
---
|
||
|
||
## DNS changes summary (all of 2026-04-13)
|
||
|
||
| Hostname | Before session | After session |
|
||
|---|---|---|
|
||
| azcomputerguru.com | A 72.194.62.5 (mis-configured as proxied=False) | CNAME tunnel proxied |
|
||
| analytics.azcomputerguru.com | A 72.194.62.5 proxied | CNAME tunnel proxied |
|
||
| community.azcomputerguru.com | A 72.194.62.5 proxied | CNAME tunnel proxied |
|
||
| radio.azcomputerguru.com | A 72.194.62.5 proxied | CNAME tunnel proxied |
|
||
| ix.azcomputerguru.com | A 72.194.62.5 proxied | **A 72.194.62.5 DNS-only (grey cloud)** (supports :2087/:2083) |
|
||
| git.azcomputerguru.com | A 72.194.62.4 proxied | CNAME tunnel proxied |
|
||
| plex.azcomputerguru.com | A 72.194.62.4 proxied | A 72.194.62.4 proxied (unchanged net effect) |
|
||
| plexrequest.azcomputerguru.com | A 72.194.62.4 proxied | CNAME tunnel proxied |
|
||
| rmm.azcomputerguru.com | A 72.194.62.4 proxied | CNAME tunnel proxied |
|
||
| rmm-api.azcomputerguru.com | A 72.194.62.4 proxied | CNAME tunnel proxied |
|
||
| sync.azcomputerguru.com | A 72.194.62.9 proxied | CNAME tunnel proxied |
|
||
| rustdesk.azcomputerguru.com | A 72.194.62.10 proxied | A 72.194.62.10 proxied (unchanged net effect) |
|
||
| secure.azcomputerguru.com | A 72.194.62.2 proxied | A 72.194.62.2 proxied (unchanged net effect) |
|
||
|
||
---
|
||
|
||
## Current tunnel ingress (9 hostnames -- /mnt/cache/appdata/cloudflared/config.yml)
|
||
|
||
Tunnel: `78d3e58f-1979-4f0e-a28b-98d6b3c3d867` (name `acg-origin`)
|
||
|
||
- azcomputerguru.com -> https://172.16.3.10:443 (SNI + noTLSVerify)
|
||
- analytics.azcomputerguru.com -> https://172.16.3.10:443
|
||
- community.azcomputerguru.com -> https://172.16.3.10:443
|
||
- radio.azcomputerguru.com -> https://172.16.3.10:443
|
||
- git.azcomputerguru.com -> https://172.16.3.20:18443
|
||
- plexrequest.azcomputerguru.com -> https://172.16.3.20:18443
|
||
- rmm.azcomputerguru.com -> https://172.16.3.20:18443
|
||
- rmm-api.azcomputerguru.com -> https://172.16.3.20:18443
|
||
- sync.azcomputerguru.com -> https://172.16.3.20:18443
|
||
- catch-all -> http_status:404
|
||
|
||
Backups of config.yml kept as `config.yml.bak-YYYYMMDD-HHMMSS` in same dir.
|
||
|
||
---
|
||
|
||
## Final verification outputs
|
||
|
||
```
|
||
azcomputerguru.com HTTP 200 cloudflare (tunnel -> IX)
|
||
analytics.azcomputerguru.com HTTP 200 cloudflare (tunnel -> IX)
|
||
community.azcomputerguru.com HTTP 200 cloudflare (tunnel -> IX)
|
||
radio.azcomputerguru.com HTTP 200 cloudflare (tunnel -> IX)
|
||
git.azcomputerguru.com HTTP 200 cloudflare (tunnel -> Jupiter NPM)
|
||
plexrequest.azcomputerguru.com HTTP 200 cloudflare (tunnel -> Jupiter NPM)
|
||
rmm.azcomputerguru.com HTTP 200 cloudflare (tunnel -> Jupiter NPM)
|
||
rmm-api.azcomputerguru.com HTTP 200 cloudflare (tunnel -> Jupiter NPM)
|
||
sync.azcomputerguru.com HTTP 200 cloudflare (tunnel -> Jupiter NPM)
|
||
|
||
ix.azcomputerguru.com:443 HTTP 200 (direct, default vhost)
|
||
ix.azcomputerguru.com:2087 HTTP 200 (direct, WHM)
|
||
ix.azcomputerguru.com:2083 HTTP 200 (direct, cPanel)
|
||
```
|
||
|
||
---
|
||
|
||
## Scripts created (in clients/internal-infrastructure/scripts/cloudflared-tunnel-setup/)
|
||
|
||
- `audit_proxied.py` -- list all proxied zone records, classify origin, external probe each
|
||
- `discover_backends.py` -- extract pfSense NAT rules and Jupiter NPM server_name mappings
|
||
- `expand_tunnel.py` -- extend tunnel ingress to 13 hostnames + DNS cutover
|
||
- `revert_broken.py` -- remove plex/rustdesk/secure from tunnel and restore their A records
|
||
|
||
All have been sanitized to use SOPS vault for credentials / env var for CF token.
|
||
|
||
---
|
||
|
||
## Pending / Incomplete / Open Items
|
||
|
||
Additions to the list from the earlier 2026-04-13 entry:
|
||
|
||
1. **`plex.azcomputerguru.com` is still broken** (525) -- requires NPM proxy_host entry on Jupiter. Likely target: `binhex-plexpass` container at `172.16.3.20:32400` (or whatever internal IP Plex uses with `network_mode: host`). Once NPM has the vhost, can add to tunnel with a single config.yml change.
|
||
|
||
2. **`rustdesk.azcomputerguru.com` is still broken** (525) -- requires:
|
||
- Finding where the rustdesk server is actually running (no `rustdesk` container visible in `docker ps` on Jupiter; may be on a different host, or decommissioned)
|
||
- Adding NPM vhost for it
|
||
- Then tunnel ingress
|
||
|
||
3. **`secure.azcomputerguru.com` is still broken** (ERR) -- requires either:
|
||
- A static route on Jupiter to 172.16.1.0/24 so cloudflared can reach 172.16.1.16
|
||
- Or move the service behind Jupiter NPM
|
||
- Or grey-cloud to DNS-only like we did for `ix.` (bypass CF entirely)
|
||
|
||
4. **Still TODO from the earlier block:**
|
||
- Submit Cox BGP ticket (`clients/internal-infrastructure/vendor-tickets/2026-04-13-cox-bgp-cloudflare-routing.md`)
|
||
- Populate CF tokens in SOPS vault (currently 1Password only)
|
||
- Fix stale `Paper123\!@#` in Dataforth AD2 vault entry
|
||
- Verify rsync covers Dataforth `VASLOG - Engineering Tested` subfolder
|
||
|
||
---
|
||
|
||
**Last Updated:** 2026-04-13 15:56
|
||
**Next Actions:** consider adding NPM vhost for plex, investigate rustdesk host, commit today's additions.
|