Files
claudetools/session-logs/2026-04-21-session.md

16 KiB
Raw Blame History

Session Log: 2026-04-21

User

  • User: Mike Swanson (mike)
  • Machine: DESKTOP-0O8A1RL
  • Role: admin

Session Summary

This session completed the M365 multi-tenant onboarding initiative. The goal was to onboard all 41 CIPP-managed partner tenants to the ComputerGuru app suite (Security Investigator, Exchange Operator, User Manager, Tenant Admin, Defender Add-on) with minimal customer interaction — customers click one URL (Tenant Admin consent), then the onboard-tenant.sh script handles all remaining programmatic consent and role assignments automatically.

Accomplishments

  1. Tenant Admin manifest fix (from previous session): Added AppRoleAssignment.ReadWrite.All (GUID: 06b708a9-e830-4db3-a914-8e69da51d44f) to Tenant Admin app. This was required for the script to programmatically grant appRoleAssignments to other SPs in customer tenants. Fixed via Management app PATCH.

  2. Re-onboarded martylryan.com and grabblaw.com: These two were consented before the manifest fix. Both needed Tenant Admin re-consent (done by Mike), then script re-run. Both now fully onboarded with all apps and directory roles.

    • martylryan.com: All 4 apps + Exchange Admin + User Admin + Auth Admin assigned
    • grabblaw.com: 3 apps (no MDE) + Exchange Admin + User Admin + Auth Admin assigned; Defender skipped (no MDE license)
  3. Cascades Tucson GoDaddy admin account (from previous session):

    • Found disabled account admin@NETORGFT4257522.onmicrosoft.com
    • Renamed UPN to admin@cascadestucson.com (domain was verified default)
    • Enabled account, reset password to Gptf*ttb123!@#-cs
    • Vaulted at D:/vault/clients/cascades-tucson/m365-admin.sops.yaml
  4. Batch tenant sweep: Ran onboard-tenant.sh against all 40 pending tenants. 17 were already fully consented and onboarded successfully. 23 still need initial Tenant Admin consent.

  5. tenant-consent.html: Updated to show only remaining pending tenants. 19 tenants now marked done (including martylryan + grabblaw post re-consent). 22 still pending.

Files Modified This Session

File Change
.claude/skills/remediation-tool/scripts/onboard-tenant.sh Major rewrite: programmatic consent for all 4 non-admin apps after Tenant Admin consent
.claude/skills/remediation-tool/references/tenants.md NEW: full 41-tenant list with display names, domains, tenant IDs, onboarding status, consent URLs
.claude/skills/remediation-tool/references/tenant-consent.html NEW + updated: dark-theme HTML page with clickable consent links; 19 tenants marked done
.claude/skills/remediation-tool/references/gotchas.md Updated: Grabblaw and martylryan marked fully onboarded with dates
D:/vault/clients/cascades-tucson/m365-admin.sops.yaml NEW: SOPS-encrypted admin credentials for Cascades Tucson

Credentials

Cascades Tucson M365 Admin


onboard-tenant.sh Architecture

Flow

  1. Resolve domain → tenant GUID (openid-configuration)
  2. Acquire Tenant Admin token (client_credentials) to verify consent
  3. Locate resource SPs in tenant: Microsoft Graph, Exchange Online, Defender ATP
  4. For each app (Security Investigator, Exchange Operator, User Manager, Defender Add-on):
    • Create SP if missing (POST /servicePrincipals) — sleep 5 after creation for replication
    • Grant all appRoleAssignments idempotently
  5. Assign directory roles (Exchange Admin to Sec Inv SP; User Admin + Auth Admin to User Mgr SP)
  6. Print status table

Key GUIDs

Permission resource app IDs:

  • Microsoft Graph: 00000003-0000-0000-c000-000000000000
  • Exchange Online: 00000002-0000-0ff1-ce00-000000000000
  • Defender ATP: fc780465-2017-40d4-a0c5-307022471b92

App IDs:

  • Security Investigator: bfbc12a4-f0dd-4e12-b06d-997e7271e10c
  • Exchange Operator: b43e7342-5b4b-492f-890f-bb5a4f7f40e9
  • User Manager: 64fac46b-8b44-41ad-93ee-7da03927576c
  • Tenant Admin: 709e6eed-0711-4875-9c44-2d3518c47063
  • Defender Add-on: dbf8ad1a-54f4-4bb8-8a9e-ea5b9634635b

Tenant Admin manifest permissions required:

  • AppRoleAssignment.ReadWrite.All: 06b708a9-e830-4db3-a914-8e69da51d44f
  • Application.ReadWrite.All: 1bfefb4e-e0b5-418b-a88f-73c46d2cc8e9
  • Directory.ReadWrite.All: 19dbc75e-c2e2-444c-a770-ec69d8559fc7

Bugs Fixed During Development

  1. stdout/stderr pollution in create_sp_if_missing: Human-readable status lines were going to stdout, corrupting sp_oid=$(create_sp_if_missing ...). Fix: all status echoes changed to >&2.
  2. Graph replication delay: Newly created SPs need ~5s before appRoleAssignments can be granted. Fix: sleep 5 after successful SP creation.
  3. jq null iterator: [.value[] | select(...)] threw on fresh SPs with null appRoleAssignments. Fix: [.value[]? | select(...)].

Onboarding Status (as of 2026-04-21)

Done (19 tenants)

andysmobilefuel.com, tedards.net, cascadestucson.com, cclac.net, cobaltfinearts.com, dataforth.com, glaztech.com, heieck.org, jemaenterprises.com, mvan.onmicrosoft.com, bestmassageintucson.com, rednourlaw.com, reliantpump.services, ridgetopgroup.com, safesitellc.com, sonorangreenllc.com, valleywideplastering.com, martylryan.com, grabblaw.com

Brian Kahn (briankahn.onmicrosoft.com), cuadro.design, Curtis Plumbing (cparizona.onmicrosoft.com), cwconcretellc.com, Feline Ltd (felineltd.onmicrosoft.com), ICE INC (iceinc.us.com), Instrumental Music (instrumentalmusic.onmicrosoft.com), JR Kennedy (jrkco.com), Khalsa Montessori (khalsamontessorischools.onmicrosoft.com), Kittle Design (kittlearizona.com), LeeAnn Parkinson (lamaddux.com), Patient Care Advocates (pcatucson.com), Putt Land Surveying (puttsurveying.com), Rincon Vista Vet (rinconvistavet.onmicrosoft.com), Russo Law (rrs-law.com), SANDTEKO (SANDTEKOMACHINERY.com), Shave Kevin (az2son.com), Starr Pass Realty (starrpass.com), The Dumpster Guys (dumpsterguys.onmicrosoft.com), The Prairie Schooner (theprairieschooner.onmicrosoft.com), Tucson Golden Corral (tucsongoldencorral.onmicrosoft.com), Tucson Mountain Motors (tucsonmountainmotors.com), Von's Carstar (vonscarstar.com)

Not in CIPP (needs investigation)


Pending / Next Steps

  1. 22 tenants need initial Tenant Admin consent — use tenant-consent.html to send links or open directly; after each consent, run onboard-tenant.sh <domain>
  2. Len's Auto Brokerage — check if in CIPP, add if not, then onboard
  3. Brian Kahn — needs Brian Kahn's own Global Admin to click consent URL (not admin@lensautobrokerage.onmicrosoft.com)
  4. Tenant-consent.html UUID tenants — three entries show GUIDs not domains (f5f86b40, dfee2224, and cparizona/felineltd/etc use onmicrosoft.com domains) — verify display names in tenants.md match

Reference

  • Consent HTML: D:/claudetools/.claude/skills/remediation-tool/references/tenant-consent.html
  • Tenant list: D:/claudetools/.claude/skills/remediation-tool/references/tenants.md
  • Onboarding script: D:/claudetools/.claude/skills/remediation-tool/scripts/onboard-tenant.sh
  • Gotchas: D:/claudetools/.claude/skills/remediation-tool/references/gotchas.md
  • Cascades vault: D:/vault/clients/cascades-tucson/m365-admin.sops.yaml

Update: 07:26 — Cloudflare Tunnel Decommission + pfSense Audit

Summary

Decommissioned the Cloudflare tunnel (cloudflared Docker container on Jupiter), migrated all 9 tunneled services to direct Cloudflare proxy, and conducted a comprehensive pfSense audit removing ~40 stale config objects (NAT rules, filter rules, outbound NAT, IPsec, and aliases).


Background: Why the Tunnel Was Created

A Cox routing issue caused Cloudflare-proxied services to route inefficiently (Cox → Cloudflare PoP → back to Cox WAN). The cloudflared tunnel was created as a workaround — it establishes an outbound connection from Jupiter to Cloudflare PoPs, so all proxied traffic flows through the tunnel rather than requiring port forwards.


Cloudflared Container — DNS Fix

Problem: cloudflared container had no DNS servers configured ([]), causing it to use Docker's default resolver which couldn't reach region1.v2.argotunnel.com. This produced a Failed to refresh DNS local resolver timeout every 5 minutes, causing intermittent slowness.

Fix: Recreated container with explicit DNS:

--dns=1.1.1.1 --dns=1.0.0.1

Container startup confirmed clean after DNS fix.

Tunnel ID: 78d3e58f-1979-4f0e-a28b-98d6b3c3d867 Config location on Jupiter: /mnt/cache/appdata/cloudflared/config.yml


Cloudflare DNS Migration

Key discovery: pfSense has NO NAT rule for port 443 on primary Cox WAN IP (98.181.90.163). All port 443 rules are bound to specific 72.194.62.x IPs. Direct proxy to 98.181.90.163 gave 522 errors because of this.

Solution: Use 72.194.62.10 (which has an existing 443 → NPM:18443 NAT rule) as the target for NPM-backed services.

Services migrated from tunnel CNAME → direct Cloudflare proxy A records:

Hostname Old Target New Target Backend
git.azcomputerguru.com tunnel CNAME 72.194.62.10 NPM → Jupiter:18443
rmm.azcomputerguru.com tunnel CNAME 72.194.62.10 NPM → Jupiter:18443
rmm-api.azcomputerguru.com tunnel CNAME 72.194.62.10 NPM → Jupiter:18443
plexrequest.azcomputerguru.com tunnel CNAME 72.194.62.10 NPM → Jupiter:18443
sync.azcomputerguru.com tunnel CNAME 72.194.62.10 NPM → Jupiter:18443
azcomputerguru.com tunnel CNAME 72.194.62.5 IX Web Hosting:443
analytics.azcomputerguru.com tunnel CNAME 72.194.62.5 IX Web Hosting:443
community.azcomputerguru.com tunnel CNAME 72.194.62.5 IX Web Hosting:443
radio.azcomputerguru.com tunnel CNAME 72.194.62.5 IX Web Hosting:443

All 9 services tested and confirmed working. Container then stopped and removed.

Public IP layout (relevant):

  • 72.194.62.5 → IX Web Hosting server (172.16.3.10) via NAT
  • 72.194.62.10 → NPM on Jupiter (172.16.3.20:18443) via NAT
  • 98.181.90.163/31 — Primary Cox WAN, NO port 443 NAT rule

pfSense SSH Access Fix

pfSense SSH was failing non-interactively with "Too many authentication failures" (SSH client tried multiple keys, hit MaxAuthTries before reaching id_ed25519).

Fix: Added id_ed25519 public key to pfSense admin user via web GUI (port 4433). Had to include webguicss=pfSense.css and dashboardcolumns=2 fields in the form POST to avoid theme validation errors.

SSH command: ssh -o StrictHostKeyChecking=no -i C:/Users/guru/.ssh/id_ed25519 -p 2248 admin@172.16.0.1

Vault updated: D:/vault/infrastructure/pfsense-firewall.sops.yaml — added web_port, ssh_key, ssh_cmd fields.


pfSense Audit — Rules Removed

All removals were done by uploading PHP scripts via SCP, executing on pfSense, then reloading filter with pfSsh.php playback svc restart filter.

Config backup pattern: /cf/conf/config.xml.bak-<description>-<timestamp>

Round 1 — TSM Network (dead server):

  • NAT: TSM Network HTTP forward (72.194.62.x → TSM)
  • NAT: TSM Network HTTPS forward
  • NAT: LDAP to DC16
  • FILTER: Associated pass rules

Round 2 — Neptune, IPsec, Gitea SSH, orphans:

  • NAT: Neptune Exchange HTTP/HTTPS forwards
  • NAT: 172.16.3.25 wildcard forward
  • NAT: 172.16.3.25 HTTP/HTTPS forwards
  • NAT: Gitea SSH forward (72.194.62.x:22 → Jupiter) — superseded by Cloudflare proxy
  • FILTER: All associated pass rules
  • FILTER: Orphaned LDAP filter rule
  • FILTER: Neptune pass rules
  • IPSEC: Phase 1 + Phase 2 for 184.182.208.116 (Mike's house — no longer needed)

Round 3 — Seafile:

  • NAT: 72.194.62.9 Seafile/Sync forward — Seafile desktop client uses sync.azcomputerguru.com (now via NPM on .10), not a dedicated IP; .9 rule was orphaned
  • FILTER: Associated pass rule

Round 4 — Neptune outbound NAT:

  • OUTBOUND NAT: NEPTUNE_Internal → 72.194.62.7 masquerade rule

Round 5 — Neptune Exchange filter (missed in Round 2):

  • FILTER: Rule with destination NEPTUNE_Internal:Exchange_Ports (was a filter rule, not NAT — earlier script only checked NAT)

Total rules removed: ~22 NAT/filter/IPsec rules


pfSense Audit — Aliases Removed (22)

All_Ports, EX1_Internal, Emby_Ports, Exchange_Ports, Exchange_VIP,
MailProtector_LDAP, NEPTUNE_Internal, Nextcloud_Local, NPM_Ports,
OwnCloud_Ports, RNAT_Webhost, RustDesk_Server, RustDesk_Server_Internal,
SpamIssue, Syslog, UNMS, Unifi_SSL, Unraid_Jupiter, Unraid_Sync,
VIP_NO_AUTODISCOVER, VPN_Ports, Webhost_Internal

Remaining aliases (all active/valid): Cloudflare, FiberGW, HTTP_HTTPS, ICE_Users, NPM_Server, Unifi_Server, Unifi_TCP, Unifi_UDP, Webhost_TCP, Webhost_UDP, Tailscale, TFTP Server, WireGuard


pfSense Items Investigated — Left Alone

Item Decision
Golden Corral (72.194.62.6 → 172.16.1.6, HTTP_HTTPS) Leave as-is — live client, working, no RDP exposed (80/443 only)
72.194.62.7 VIP ("MAIL/NEPTUNE") Unused IP — no rules reference it; could remove VIP or reassign
Cloudflare alias Unused — could apply to restrict WAN access to CF IPs only
Broad pass tcp/udp any→any WAN rule Noted, not yet addressed
72.194.62.4 → NPM:18443 ("Emby on Fiber") Verified pointing to NPM, labeled correctly
OwnCloud VM (172.16.3.22) NAT rule still valid — cloud.acghosting.com lives there

Infrastructure Reference

Asset Detail
pfSense 172.16.0.1, SSH port 2248, HTTPS port 4433, admin user
pfSense config /cf/conf/config.xml
Jupiter (Unraid) 172.16.3.20
NPM (Nginx Proxy Manager) Jupiter:18443 (HTTPS), Jupiter:1880 (HTTP)
cloudflared Stopped/removed — tunnel decommissioned
Primary Cox WAN 98.181.90.163/31 — no port 443 NAT
Additional public IPs 72.194.62.210, 70.175.28.5157

Pending / Next Steps (Infrastructure)

  1. 72.194.62.7 VIP — decide: remove (Neptune gone) or repurpose
  2. Cloudflare alias — consider applying to WAN rules to restrict to CF IPs only (security hardening)
  3. Broad WAN pass rule — review and tighten if possible
  4. 22 M365 tenants — still need initial Tenant Admin consent (unchanged from earlier session)

Note for Howard

Vault + SOPS age key setup required on ACG-Tech03L before remediation-tool will work.

1. Clone the vault repo

Run in Git Bash (real terminal, not Claude Code shell):

git clone http://azcomputerguru@172.16.3.20:3000/azcomputerguru/vault.git D:/vault

Password: Gptf*77ttb123!@#-git

2. Install the SOPS age key

Create this file: C:\Users\howard\.config\sops\age\keys.txt

Content (copy exactly):

# created: 2026-03-30T13:53:19-07:00
# public key: age1qz7ct84m50u06h97artqddkj3c8se2yu4nxu59clq8rhj945jc0s5excpr
AGE-SECRET-KEY-1DE3V6V0ZLLZ45A7GA77M79CTN4LZQMTRCURP8VRGNLV6T2FSZEEQXUW2EU

3. Add vault_path to identity.json

Edit .claude/identity.json in your ClaudeTools folder, add:

"vault_path": "D:/vault"

4. Test

bash C:/claudetools/.claude/skills/remediation-tool/scripts/get-token.sh grabblaw.com investigator

Expected: JWT token starting with eyJ...


Note for Mike (Mac)

Vault + SOPS age key setup required on Mikes-MacBook-Air before remediation-tool will work.

1. Clone the vault repo

Run in a real terminal (not Claude Code shell):

git clone http://azcomputerguru@172.16.3.20:3000/azcomputerguru/vault.git ~/vault

Password: Gptf*77ttb123!@#-git

2. Install the SOPS age key

mkdir -p ~/.config/sops/age
cat > ~/.config/sops/age/keys.txt << 'AGEEOF'
# created: 2026-03-30T13:53:19-07:00
# public key: age1qz7ct84m50u06h97artqddkj3c8se2yu4nxu59clq8rhj945jc0s5excpr
AGE-SECRET-KEY-1DE3V6V0ZLLZ45A7GA77M79CTN4LZQMTRCURP8VRGNLV6T2FSZEEQXUW2EU
AGEEOF
chmod 600 ~/.config/sops/age/keys.txt

3. Add vault_path to identity.json

Edit /Users/azcomputerguru/ClaudeTools/.claude/identity.json, add:

"vault_path": "/Users/azcomputerguru/vault"

4. Test

bash ~/ClaudeTools/.claude/skills/remediation-tool/scripts/get-token.sh grabblaw.com investigator

Expected: JWT token starting with eyJ...