diff --git a/.claude/scripts/rmm-search.sh b/.claude/scripts/rmm-search.sh index ba14b3f7..fba4f208 100644 --- a/.claude/scripts/rmm-search.sh +++ b/.claude/scripts/rmm-search.sh @@ -51,4 +51,4 @@ fi # Pipe agents on stdin (payload too large for argv on Windows); flags via env. printf '%s' "$AGENTS" | QUERY="$QUERY" CLIENT="$CLIENT" ONLINE="$ONLINE" JSON="$JSON" LISTC="$LISTC" LIMIT="$LIMIT" \ - python3 "$ROOT/.claude/scripts/rmm-search.py" + bash "$ROOT/.claude/scripts/py.sh" "$ROOT/.claude/scripts/rmm-search.py" diff --git a/errorlog.md b/errorlog.md index 8c6411ca..d93c785f 100644 --- a/errorlog.md +++ b/errorlog.md @@ -17,6 +17,8 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure · +2026-06-19 | GURU-BEAST-ROG | rmm-search | [friction] rmm-search.sh invoked bare python3 -> MS Store stub on Windows; fixed to use py.sh resolver [ctx: ref=py.sh-broadcast-9b1c5c39] + 2026-06-19 | GURU-KALI | git/submodules | [friction] fresh claudetools re-clone: 'git submodule update --init --recursive' failed with 'could not read Username / terminal prompts disabled' for all https://git.azcomputerguru.com submodules; fix = set credential.helper=store GLOBALLY (local-on-superproject does NOT propagate to per-submodule child clone processes). ~/.git-credentials already had the cred. [ctx: ref=reclone-submodule-creds event=2026-06-18-restructure] 2026-06-18 | GURU-5070 | agy/search | gemini CLI threw ineligible/projectId setup error (throwIneligibleOrProjectIdError), empty response after 3 attempts [ctx: mode=search host=GURU-5070] diff --git a/session-logs/2026-06/2026-06-19-mike-reclone-recovery-and-vwp-time-sync.md b/session-logs/2026-06/2026-06-19-mike-reclone-recovery-and-vwp-time-sync.md new file mode 100644 index 00000000..388e95d5 --- /dev/null +++ b/session-logs/2026-06/2026-06-19-mike-reclone-recovery-and-vwp-time-sync.md @@ -0,0 +1,148 @@ +## User +- **User:** Mike Swanson (mike) +- **Machine:** GURU-BEAST-ROG +- **Role:** admin + +## Session Summary + +Two major bodies of work. First, completed the ClaudeTools re-clone migration on GURU-BEAST-ROG +per the 2026-06-18 CUTOVER (history rewritten + force-pushed, all projects split into git +submodules, `.git` 3.2GB->80MB). The fresh clone was healthy but, as designed, did not carry +gitignored per-machine state. Recovered that state from `ClaudeTools.old`: `identity.json` (which +had made the first `/sync` run as "unknown" and skip vault), git author config, `.claude/settings.local.json`, +`.mcp.json`, grepai (`grepai.exe` + `.grepai/`), and the discord-bot runtime (`.venv`/`.env`/`.attachments`) +that had left the `ClaudeToolsDiscordBot` nssm service in `SERVICE_STOPPED`. Also rescued a +174-line +`RMM_THOUGHTS.md` block (the 2026-06-08 re-grounding pass) that was staged-but-uncommitted in `.old`'s +guru-rmm submodule and would have been lost. + +Built a compliance-gated migration script (`.claude/scripts/migrate-to-submodules.sh`) so other fleet +machines self-heal: it detects whether a clone shares history with the rewritten origin (merge-base +test, with an offline RECLONE.md/submodule fallback), leaves compliant clones untouched, and otherwise +re-clones and recovers the same gitignored per-machine state the manual RECLONE.md steps omit. Pushed +it (dafcec5), updated RECLONE.md to point at it, broadcast to the fleet (coord 249feb4c) and filed a +backstop todo (e318a929). Swept `.old` for stranded gitignored data (192 entries): recovered the small +per-machine config, and began preserving the large deliberately-purged client data (18GB valleywide +app-modernization, datto BSOD case, radio-show test-data) to `\\172.16.3.20\Backups\Perm\` since it is +in no git history or bundle. Removed a stale `/autotask` entry from the self-check baseline manifest +(cd478ca); GURU-5070 independently moved it to capability_commands (b668430), converging on next sync. + +Second, diagnosed and fixed a Valley Wide (VWP.US) issue: Bart Graffin's estimating workstation +(DESKTOP-EST) had lost its O: and P: drive mappings to `\\WINFILESVR`. Root cause was Kerberos time +skew (System error 2457) — DESKTOP-EST's clock was free-running, and the whole VWP domain had no real +time authority (the DCs were pulling time from the Hyper-V host / free-running). Per Mike's directive +("all VWP machines sync to DCs, all DCs to an outside source"), configured both DCs (VWP-DC1 the PDC, +VWP_ADSRVR) to external NTP with the Hyper-V VM time provider disabled — both now synchronized (Leap 0) +— then set all 26 VWP members to domhier (sync-from-DCs) + VM-provider-disabled + resync. DESKTOP-EST +synced to VWP-DC1, and O: and P: were re-mapped and verified OK/reachable. The remaining members are +configured correctly and their source resolves to VWP-DC1; full convergence completes over the next +w32time poll cycles as the DCs' freshly-established NTP discipline settles. + +## Key Decisions + +- **Recover per-machine state from `.old`, do not re-onboard.** A fresh clone never carries gitignored + per-machine files; copying the valid `.old` copies (identity, settings.local, .mcp.json, grepai, + discord-bot venv/.env) is faster and exact. This is the "recover uncommitted work from .old" step the + CUTOVER broadcast prescribed. +- **RMM_THOUGHTS.md rescue: reconstruct deterministically, never 3-way merge.** The file is binary to + git (non-UTF8 char) and origin/main had advanced (3347eb8). Appended the rescued tail (bytes 29622+) + onto the current origin/main file so both the newer thought and the +174 rescued lines survive + (44137 bytes). Pushed as 2e469f1. +- **Migration script is compliance-gated and standalone.** Old clones can't pull it, so it is distributed + by raw URL; it leaves compliant clones untouched (per Mike) and only acts on non-compliant ones. +- **Bucket-2 large data is preserved to a file share, never re-added to git.** The 18GB valleywide + app-modernization etc. were deliberately purged; they are gitignored so absent from the Jupiter bundle. + Moved to `Perm\claudetools-purged-2026-06\` so `.old` can be deleted without data loss. +- **All VWP DCs on external NTP (not just the PDC).** Followed Mike's explicit directive over the + PDC-only best-practice default; also avoided a domhier dependency loop on the secondary DC. +- **Disable the Hyper-V VM time provider on the DCs/members.** The VMICTimeProvider was overriding NTP + (VWP-DC1 showed "VM IC Time Synchronization Provider" as source despite configured NTP peers); + disabling it let external NTP / domhier actually take effect. +- **Stop the per-machine convergence chase.** Once config is correct on all 26 and the DCs are good, + members converge on their own; mass-polling fights the RMM connection-rate limit for little value. + +## Problems Encountered + +- **First `/sync` ran as "unknown", skipped vault** — `identity.json` was missing (gitignored, not in + fresh clone). Restored from `.old`; re-ran sync (vault then synced clean). +- **`ClaudeToolsDiscordBot` SERVICE_STOPPED** — nssm pointed at `.venv\Scripts\python.exe` that the fresh + clone lacked. Recovered `.venv`/`.env`/`.attachments` + created `logs/` from `.old`; service now Running. +- **guru-rmm submodule detached-HEAD + origin advanced** — commit landed on detached HEAD; origin/main had + moved to 3347eb8. Reconstructed onto origin/main and pushed (2e469f1). +- **DESKTOP-EST O:/P: failed with System error 2457 (clock not synced with PDC)** — Kerberos time skew. + Root cause traced to the whole VWP domain lacking external time authority. +- **`net use` got System error 67 (network name not found)** — a `\\` in the UNC collapsed to `\` through + the bash->jq->agent->PowerShell pipeline. Rebuilt the UNC from `[char]92` so it can't be mangled. +- **PowerShell parse error** — a double-quoted string ending in `\"` (`"$($d):\"`) broke the parser. Used + single-quoted concatenation (`$d + ':\'`) / `[char]92` instead. +- **RMM API drops bursty connections (HTTP 000)** — rapid/looped POSTs and background-task POSTs all + returned 000, while spaced single foreground POSTs returned 200. Dispatched the 26-member rollout in + foreground chunks ~4s apart. (`--next` single-connection and background bash both failed.) +- **`rmm-search.sh` hit the Windows MS-Store python stub** — it called bare `python3`. Fixed to use + `.claude/scripts/py.sh` (the fleet resolver). +- **w32time "no time data available" on first member resync** — transient post-restart; a second resync + (and DC dispersion settling) converges them. + +## Configuration Changes + +- Created: `.claude/scripts/migrate-to-submodules.sh` (compliance-gated re-clone + recovery). Committed dafcec5. +- Modified: `RECLONE.md` — added the automated path + raw-URL. Committed dafcec5. +- Modified: `.claude/skills/self-check/baseline/manifest.json` — removed `/autotask`. Committed cd478ca. +- Modified: `.claude/scripts/rmm-search.sh` — `python3` -> `py.sh` resolver (uncommitted at log time; swept by this save). +- Recovered (gitignored, per-machine, from `.old`): `.claude/identity.json`, `.claude/settings.local.json`, + `.mcp.json`, `grepai.exe`, `.grepai/`, `projects/discord-bot/{.venv,.env,.attachments,logs}`. +- Submodule guru-rmm: `docs/RMM_THOUGHTS.md` +174 lines, committed + pushed 2e469f1 (gururmm main); aside backup deleted. +- Set repo git config user.name="Mike Swanson", user.email="mike@azcomputerguru.com". +- VWP DCs (VWP-DC1, VWP_ADSRVR): w32time external NTP manualpeerlist + `/reliable:yes`, VMICTimeProvider Enabled=0. +- VWP 26 members: w32time `/syncfromflags:domhier`, VMICTimeProvider Enabled=0, resync. +- DESKTOP-EST: O: and P: persistent mappings re-created (had been deleted during troubleshooting). + +## Credentials & Secrets + +- No new credentials created. Discord bot token verified already vaulted at + `projects/discord-bot/bot-token.sops.yaml` (field `credentials.bot_token`; matches live `.env`, + last4 `RKTo`). Anthropic key at `projects/discord-bot/anthropic-api.sops.yaml`. RMM API admin creds + at `infrastructure/gururmm-server.sops.yaml` (used via rmm-auth.sh). + +## Infrastructure & Servers + +- ClaudeTools repo: `git.azcomputerguru.com/azcomputerguru/claudetools.git` (rewritten history, submodules). +- guru-rmm submodule: `git.azcomputerguru.com/azcomputerguru/gururmm.git`, main advanced 3347eb8 -> 2e469f1. +- Jupiter backups share: `\\172.16.3.20\Backups` (Perm = legacy-app archives; pre-split bundle in Gitea-Storage). +- VWP domain `VWP.US`: PDC emulator **VWP-DC1** (172.16.9.2), DC2 **VWP_ADSRVR** (192.168.0.25), both Server 2019. + File server **WINFILESVR** = 192.168.0.35 (`Office_Archive` -> O:, `Estimating Archive` -> P:). Not a DC. +- VWP drive map (per user, persistent/manual — no GPP Drives.xml in SYSVOL): G:=\\VWP-FILES\G-drive, + O:=\\WINFILESVR\Office_Archive, P:=\\WINFILESVR\Estimating Archive, Q:=\\VWP-QBS\Quickbooks. +- Bart Graffin = `VWP\Estimating` account on **DESKTOP-EST** (id 023ecbb3-55ca-487a-836d-d4a30b3da656). +- GuruRMM API: `http://172.16.3.30:3001` (front end drops bursty connections — dispatch paced from foreground). +- NTP sources used: time.windows.com, time.nist.gov, 0.pool.ntp.org (0x9). + +## Commands & Outputs + +- `migrate-to-submodules.sh --check`: compliant clone -> GREEN exit 0; `.old` -> NON-COMPLIANT exit 1 (verified both). +- self-check progression: RED (identity missing + /autotask) -> AMBER (0 FAIL) after identity restore + manifest fix. +- DESKTOP-EST before: `net use` O:/P: = Disconnected, Test-Path False; `net use O: \\WINFILESVR\... ` -> System error 2457. +- VWP-DC1 after VMIC disable: `Leap Indicator: 0`, `Source: 0.pool.ntp.org`. VWP_ADSRVR: Source time.nist.gov, Leap 0. +- DESKTOP-EST after fix: O: and P: `OK`, `reachable=True`. +- Member dispatch reliability: looped/`--next`/background POSTs = HTTP 000; foreground chunks 4s apart = HTTP 200. +- Bot alert message_id 1517568177666785370 (#dev-alerts). + +## Pending / Incomplete Tasks + +- **VWP member time convergence** — all 26 configured + source resolving to VWP-DC1; several already Leap 0, + others settling. Re-verify the fleet in ~45-60 min and nudge any straggler (VWP-QBS was still Local CMOS; + DESKTOP-S4GNL8O offline = queued). Burst limit: poll/dispatch paced from foreground. +- **Bucket-2 preservation copies to `\\172.16.3.20\Backups\Perm\claudetools-purged-2026-06\`** — 18GB + valleywide (background) + datto + radio-show; verify counts/sizes complete BEFORE deleting `ClaudeTools.old`. +- **`.old` deletion** — safe only after the Perm copies are verified (it is the only copy of the purged data). +- **rmm-search.sh py.sh fix** — committed via this save's auto-sync; confirm it landed. +- Optional: re-verify self-check is GREEN after pulling GURU-5070's b668430 (capability_commands move). + +## Reference Information + +- Commits: dafcec5 (migration script + RECLONE.md), cd478ca (manifest /autotask), gururmm 2e469f1 (RMM_THOUGHTS rescue). +- Coord: broadcast 249feb4c, todo e318a929; incoming b668430 (self-check baseline, GURU-5070). +- Migration script raw URL: `https://git.azcomputerguru.com/azcomputerguru/claudetools/raw/branch/main/.claude/scripts/migrate-to-submodules.sh` +- RMM agent ids: VWP-DC1 8eefbba6-28cf-4e3f-8fda-11900e0ac302, VWP_ADSRVR bd2f2f86-ea33-4202-828f-b378e459e891, + DESKTOP-EST 023ecbb3-55ca-487a-836d-d4a30b3da656, Estimator1 7f03cbe4-6335-429a-bbc8-7cbf99f1a979. +- Key files: `.claude/scripts/migrate-to-submodules.sh`, `RECLONE.md`, `.claude/skills/self-check/baseline/manifest.json`, + `.claude/tmp/vwp-time-payload.json` (w32time member config, scratch).