Commit Graph

25 Commits

Author SHA1 Message Date
da87d314c5 harness: fleet-wide functional-error + correction + friction logging
Add .claude/scripts/log-skill-error.sh — the canonical agent error log helper
(writes errorlog.md in DATE | MACHINE | skill | [type] error format, soft-fails).
Three categories: execution failures (default), user corrections (--correction),
and preventable self-inflicted friction (--friction; cite ref= when it repeats a
documented gotcha). Goal: stop paying tokens twice for the same avoidable mistake.

- CLAUDE.md: make logging mandatory for all skills + corrections + friction.
- skill-creator: new skills must wire in the helper (guidance + checklist).
- Retrofit every skill script's genuine failure branches to call the helper
  (b2/bitdefender/mailprotector/packetdial/coord python CLIs; remediation-tool
  + onboard365 bash; vault, rmm-auth, post-bot-alert, agy, grok, 1password,
  run-onboarding-diagnostic). Handled conditions + self-tests left alone.
- errorlog.md: broaden header to cover skills + harness + corrections; seed this
  session's corrections (INKY, Mail.Send token-audience, omnibox-strictness) and
  friction (git-bash /tmp, env-persistence, argv-limit, PowerShell var-case).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 11:40:25 -07:00
2f312af41e sync: auto-sync from GURU-5070 at 2026-06-14 20:04:14
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-14 20:04:14
2026-06-14 20:05:02 -07:00
b4fcea91dc fix(remediation): close the recurring Exchange-Admin-role gap fleet-wide
EXO email-cleanup tasks (Search-UnifiedAuditLog, Get-MessageTrace, inbox rules) kept
401/403-ing per tenant because the Exchange Operator SP was missing the Exchange Admin
directory role — admin consent grants Exchange.ManageAsApp but never the directory role.
onboard-tenant.sh assigns it, but tenants consented before that step / by hand never got
it, and nothing audited for it. Hence the recurring 'next onboarding will fix it' (false
for already-onboarded tenants).

- NEW assign-exchange-role.sh: idempotent role assignment via the authoritative
  roleManagement/directory/roleAssignments API (the legacy directoryRoles/members list
  reads back unreliably). <domain|--all> + --verify/--dry-run.
- Backfilled the whole fleet (--all): 13 stragglers ASSIGNED, 12 already OK, 20 skipped
  (tenant-admin not consented), 0 errors. Safe Site included.
- Standing audit documented (assign-exchange-role.sh --all --verify) + memory so no future
  session repeats the empty promise.
- Adds wiki/clients/safesite.md (tenant + 4-source endpoint inventory + investigation).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 20:07:28 -07:00
f49f632c7d sync: auto-sync from GURU-5070 at 2026-06-08 19:51:00
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-08 19:51:00
2026-06-08 19:51:46 -07:00
c3a3f99b91 sync: auto-sync from GURU-5070 at 2026-06-08 08:34:06
Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-08 08:34:06
2026-06-08 08:34:11 -07:00
332bd1388b fix(remediation): URL-encode role_assigned() Graph $filter
role_assigned() sent an unencoded space in the OData $filter
(principalId eq '...'), so the query always failed and the function
always returned false -> onboard-tenant.sh always printed
"MISSING -> ASSIGNING" and relied on the conflict-tolerant POST for
idempotency. Fixed to %20; corrected the stale PIM-misdiagnosis comment.
Verified live against the ACG tenant. Roles still assign correctly;
PRESENT/MISSING reporting is now accurate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:03:27 -07:00
895b1805d1 Session log: GuruRMM 4-bug fix + MSP360 backup integration 2026-05-19
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 16:54:06 -07:00
3db6c2c1dd remediation-tool: add cert-auth (client_assertion JWT) to get-token.sh
Auth selection logic:
- Default: prefer cert when cert_thumbprint_b64url + cert_private_key_pem_b64
  are present in the vault entry's credentials block; fall back to client_secret.
- REMEDIATION_AUTH=secret  -> force client_secret flow.
- REMEDIATION_AUTH=cert    -> force cert flow; error if cert fields missing.
- Logs [INFO] auth=cert/secret to stderr so users see which path was taken.

Cert flow signs an RS256 JWT (header includes x5t) via inline Python (PyJWT
+ cryptography), POSTs client_assertion_type +
client_assertion=<jwt> in place of client_secret. Same scope, same cache, same
error handling (AADSTS7000229 still emits the consent URL).

Single sops -d to a mktemp file feeds both field reads to avoid repeated
~1s decrypt invocations on Windows; trap removes plaintext on exit.

Verified end-to-end against tedards.net for all three modes after wiping
/tmp/remediation-tool/.
2026-05-01 16:52:12 -07:00
ec18802e76 remediation-tool: flag PIM role_assigned gap for Howard
role_assigned() only checks direct/permanent roleAssignments.
PIM-managed assignments are in roleAssignmentSchedules and won't
be found, producing noisy (non-blocking) output on re-runs against
tenants with PIM-assigned roles (e.g. Cascades).

TODO comment added at the helper — Howard to implement the fix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 09:11:40 -07:00
f21745eff9 cascades: CA unblock + Phase B buildout + onboard-tenant.sh CA Admin backfill
Day-long session unblocking the Cascades CA reconciliation that was paused on
the Tenant Admin SP directory-role gap. Discovered Microsoft also tightened
the OAuth scope for /identity/conditionalAccess/* reads (Policy.Read.All now
required, Policy.ReadWrite.ConditionalAccess no longer accepted for reads).
Patched Tenant Admin manifest accordingly and re-consented in Cascades.

Phase B Intune state turned out to be far more built than the 4/20 log
suggested -- compliance policy, Wi-Fi, device restrictions, both SDM app
configs (Authenticator + Teams), and 7 of 8 apps were already deployed and
assigned. PATCHed device restrictions to block camera/Bluetooth/roaming
and enabled Managed Home Screen multi-app kiosk (ALIS + Teams visible,
10-min auto-signout). PATCHed Cascades named location to add primary WAN
(184.191.143.62/32). Howard added Outlook from Managed Play; SMB encryption
enabled on \CS-SERVER\homes.

CA bypass design corrected -- original §5 plan in user-account-rollout-plan.md
called for "block off-site + MFA on-site" which doesn't match the actual goal
of bypass when network + device assurance present. Reshaped to three policies
that produce on-site-compliant = password only, anything else = MFA or block.

onboard-tenant.sh patched to:
  1. Backfill Policy.Read.All on Tenant Admin SP if missing (idempotent --
     for tenants consented before the 2026-04-29 manifest update).
  2. Assign Conditional Access Administrator directory role to Tenant Admin
     SP at onboard time. Mirrors the Exchange Operator fix Mike landed in
     5da9804.

Validated with --dry-run against Cascades. Customer-facing tenants already
onboarded should be re-run with this script to backfill both items.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 07:32:23 -07:00
5da9804aa2 fix(onboard): auto-assign Exchange Admin to Exchange Operator SP; mark Sandteko fully onboarded
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 20:13:20 -07:00
a44599a169 remediation-tool: fix tenant-sweep tier name; mark Kittle partially onboarded
- tenant-sweep.sh line 12: renamed tier `graph` to `investigator` to match
  the valid tier name expected by get-token.sh
- tenants.md: updated Kittle Design & Construction consent status from NO
  to PARTIAL with notes on what was consented and what remains pending

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 07:13:16 -07:00
4d469010ed feat: add intune-manager tier to get-token.sh 2026-04-21 20:02:19 -07:00
430c99a76a fix: two bugs in get-token.sh vault path resolution
1. Variable name collision: VAULT_PATH was used for both the SOPS file
   relative path (set by case statement) and the vault root override env
   var. Renamed env var override to VAULT_ROOT_ENV to avoid collision.

2. Wrong directory depth: CLAUDETOOLS_ROOT was navigating 3 levels up
   from scripts/ landing at .claude/ instead of repo root. Fixed to 4
   levels (scripts -> remediation-tool -> skills -> .claude -> repo root).

Also added jq as primary vault_path reader (handles Unix paths on Windows),
with cygpath-converted Python fallback.

Bugs discovered during Mac testing 2026-04-21. Windows worked only because
tokens were served from /tmp cache after first acquisition.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 19:12:15 -07:00
afe8ce9b41 sync: auto-sync from DESKTOP-0O8A1RL at 2026-04-21 19:10:13
Author: Mike Swanson
Machine: DESKTOP-0O8A1RL
Timestamp: 2026-04-21 19:10:13
2026-04-21 19:10:25 -07:00
00f4722772 sync: auto-sync from Mikes-MacBook-Air.local at 2026-04-21 19:02:07
Author: Mike Swanson
Machine: Mikes-MacBook-Air.local
Timestamp: 2026-04-21 19:02:07
2026-04-21 19:02:09 -07:00
fcb5bebf5b fix: vault path from per-machine identity.json, not hardcoded paths
- Add .claude/scripts/vault.sh wrapper (reads vault_path from identity.json)
- get-token.sh + patch-tenant-admin-manifest.sh read identity.json for vault root
- syncro.md uses wrapper via CLAUDETOOLS_ROOT
- CLAUDE.md + ONBOARDING.md document the pattern and prompt for vault_path on onboarding
- identity.json now includes vault_path (D:/vault on DESKTOP-0O8A1RL)

Howard and Mac need vault_path added to their identity.json after pulling.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 19:01:27 -07:00
da96c5a8ee fix: portable vault path resolution across Windows/Mac/Linux
Replace hardcoded D:/vault references with candidate-list pattern
that also checks $HOME/vault, ~/.vault, and respects VAULT_PATH
env var override. Fixes vault.sh lookup failures on Mac and
Howard's machine.

Affected: CLAUDE.md, syncro.md, get-token.sh, patch-tenant-admin-manifest.sh

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 18:58:43 -07:00
f9950f889f fix: add sleep after SP creation + handle null appRoleAssignments in jq
New SPs need ~5s to replicate before appRoleAssignments can be granted.
Also fixes jq null iterator error when SP has no existing assignments.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:51:48 -07:00
24a0bb4f0a feat: onboard-tenant.sh now programmatically consents full app suite
After Tenant Admin is consented by customer admin, the script automatically:
- Creates SPs for Security Investigator, Exchange Operator, User Manager,
  and Defender Add-on (programmatic consent, no extra customer clicks needed)
- Grants all required Graph, Exchange Online, and Defender ATP appRoleAssignments
- Idempotent: skips any permissions already granted

Also added AppRoleAssignment.ReadWrite.All to Tenant Admin manifest so
fresh consents include this permission. Existing tenants (martylryan.com,
grabblaw.com) need a one-time Tenant Admin re-consent to pick it up.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 17:33:50 -07:00
265097d752 fix: remediation tool onboarding — add RoleManagement.ReadWrite.Directory + auto role assignment
Root cause: app-only Graph operations (password reset, Exchange REST) require
directory roles on each SP in the customer tenant, not just admin consent.
RoleManagement.ReadWrite.Directory was missing from all app manifests, making
role assignment impossible without manual portal work that was never being done.

Changes:
- patch-tenant-admin-manifest.sh: adds RoleManagement.ReadWrite.Directory to
  Tenant Admin app manifest via Management app, grants home-tenant consent
- onboard-tenant.sh: new script — resolves tenant, acquires Tenant Admin token,
  assigns Exchange Administrator to Security Investigator SP and User/Auth
  Administrator to User Manager SP; --dry-run supported; idempotent
- get-token.sh: detects AADSTS7000229, emits consent URL + onboard-tenant.sh
  reminder instead of silent failure
- gotchas.md: onboarding steps at top, tenant table expanded with role columns,
  all known tenants updated including martylryan.com (first fully onboarded)

Verified: martylryan.com fully onboarded, password reset to MLR2026!! succeeded

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 16:56:47 -07:00
0a70aad0d2 sync: auto-sync from ACG-TECH03L at 2026-04-20 14:15:01
Author: Howard Enos
Machine: ACG-TECH03L
Timestamp: 2026-04-20 14:15:01
2026-04-20 14:15:07 -07:00
391178ef02 fix: replace python3 with py/jq throughout scripts and docs
Windows Store python3 stub returns exit 49 instead of running Python.
Replace with: py (Windows launcher) for actual Python code, jq for
simple JSON extraction. Reorder fallback loops to try py first.
Add Bash(py:*) to settings.local.json allowlist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 12:14:43 -07:00
2eb2d2f9dc Session log: remediation skill rewrite (5-app tiered arch) + Cascades breach check John Trozzi
- Rewrote get-token.sh: tiered app system (investigator/exchange-op/user-manager/tenant-admin/defender)
- Updated SKILL.md, command, gotchas, checklist, graph-endpoints for new app suite
- Cascades breach check: mailbox clean, inbound phishing received by John, DMARC gap noted

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 11:35:18 -07:00
1c7df5018e Session log: multi-user setup, audit + gap fixes, Howard onboarding package
Two session logs:
- session-logs/2026-04-16-session.md: cross-cutting (multi-user, audit, infrastructure)
- guru-rmm session log appended: MSI installer, Len's Auto Brokerage, Uranus, migration drift

Gap fixes: GrepAI initialized + MCP server added, Ollama models pulling,
settings.json created (bypassPermissions), MCP_SERVERS.md written.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:56:26 -07:00