sync: auto-sync from GURU-KALI at 2026-06-01 20:34:26
Author: Mike Swanson Machine: GURU-KALI Timestamp: 2026-06-01 20:34:26
This commit is contained in:
@@ -0,0 +1,177 @@
|
||||
# GURU-KALI Ghost-Churn Fix, BUG-016/017 Filing, Memory Dream + Consolidation Collision
|
||||
|
||||
## User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** GURU-KALI
|
||||
- **Role:** admin
|
||||
|
||||
## Session Summary
|
||||
|
||||
Four substantive threads on GURU-KALI today, two of them tightly intertwined with parallel work happening on other workstations.
|
||||
|
||||
**Thread 1 — GURU-KALI ghost-agent churn (full diagnosis + remediation + upstream fix lifecycle in one day).** Coord message from GURU-5070 reported that GURU-KALI was minting ~10 ghost agent rows on the gururmm server, one ~daily. The initial diagnosis blamed a read-only root filesystem. Local check disproved that — `findmnt -no OPTIONS /` showed `rw,relatime,errors=remount-ro` on the host, no ext4 errors in the kernel log, no ro/rw transitions since the normal boot-time remount. The actual cause turned out to be `gururmm-agent.service` running with `ProtectSystem=strict`, which creates a private mount namespace where `/` is mounted ro for the service. The unit declared `ReadWritePaths=/var/log /usr/local/bin /etc/gururmm` but omitted `/var/lib/gururmm` where `device_id.rs:get_persist_path()` writes `.device-id`. Inside the agent's namespace, every persist attempt returned EROFS. Combined with a second bug (the agent regenerating a fresh UUID on every persist failure instead of caching in memory), this produced the ghost-row blizzard. Workaround applied: drop-in override at `/etc/systemd/system/gururmm-agent.service.d/override.conf` adding `ReadWritePaths=/var/lib/gururmm`. After `daemon-reload` + restart, the new agent persisted a stable device-id `ec975630-d297-4df9-bcb5-a445c65b648d` and zero EROFS warnings have logged since. Coord reply sent to GURU-5070 (`d91406ce-c4ab-4914-b479-c1f4a948096f`) — they purged the 11 ghost rows down to 1 keeper (agent_id `9bca5090-2d0e-40ad-9078-c11af8a435c0`).
|
||||
|
||||
**Thread 2 — Filed BUG-016 and BUG-017 in the gururmm roadmap, then both fixed upstream same-day.** Wrote both bug entries into `projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md` with full root-cause, suggested fixes, and the GURU-KALI workaround. Notified Howard via coord (`99162698-5439-4fcb-9c27-719a569a717c`). Mike picked up both fixes on another workstation later in the day — `30da053 fix(agent): resolve Linux device_id persistence issues (BUG-016, BUG-017)` shipped to gururmm/main, then `2089e89 docs(roadmap): mark BUG-016 and BUG-017 as fixed`. Fix shape matched the spec recommendations exactly: unit template gained `StateDirectory=gururmm` (preferred over appending to `ReadWritePaths`), and `device_id.rs:get_device_id()` now uses `OnceLock<String>` to cache the first generated UUID even when persistence fails. Toward end of session, refreshed the GURU-KALI base unit to match the upstream-fixed template (replaced `gururmm-agent.service` with the new shape, removed the override drop-in, restarted) — backup of pre-fix unit saved as `gururmm-agent.service.pre-bug016-fix`. Verified device-id unchanged after restart, mountinfo line shows `/var/lib/gururmm` rw-bound via StateDirectory. The auto-update earlier in the day had refreshed the agent binary at 20:24 but NOT the unit file, so removing the override without refreshing the unit would have regressed BUG-016 on this box — caught that before acting.
|
||||
|
||||
**Thread 3 — sync.sh hardening, three rounds across one day, and submodule identity reconcile.** First round (dead-submodule-ref tolerance): a routine `/sync` failed because `git fetch` recursed into submodules and hit a transient dead ref in `guru-connect` history. Fix added `--no-recurse-submodules` to the parent fetch + pull and made the post-rebase `git submodule update` tolerant of per-submodule failures. Second round (`coord_api` lifted to identity.json): the hardcoded LAN IP `http://172.16.3.30:8001` was identified in three scripts (sync.sh, check-messages.sh, check-ksteen-smartbadge.sh) — silently breaks off-LAN/VPN workstations. Lifted into `.claude/identity.json` as `coord_api` with the existing IP as fallback default; `migrate-identity.sh` updated to populate the field for any machine missing it. Broadcast `1d93052f-aa79-4ac3-a0e9-99f04a4695c9` told the team to run `migrate-identity.sh`. Dead Windows-path repo-root fallback loop at sync.sh:102 deleted. Third round (submodule identity reconcile): two youtube-sync-docker commits were authored as `ComputerGuru <guru@GURU-KALI.lan>` because sync.sh's `reconcile_git_identity` only ran on the parent repo. Wrote `docs/specifications/SUBMODULE-IDENTITY-RECONCILE-SPEC.md`, implemented the spec (10-line addition to Phase 1a — `(cd "$ppath" && reconcile_git_identity ...)` for each submodule). Empirically verified: caught real drift on this box's `guru-connect` submodule (unset identity → Mike Swanson), idempotent on re-runs, forced-drift test on youtube-sync-docker passed. Coord todo `a176100c` opened and closed in the same session.
|
||||
|
||||
**Thread 4 — Memory dream skill collision with Mike's parallel consolidation.** Tried the new memory-dream skill (landed via `/sync` earlier in the day). Default report-only run produced a clean report: 104 memory files, 17 orphan files needing index lines, 12 broken backlinks, 12 overlap clusters (biggest: 19 `feedback_syncro_*` files), 1 stale dated fact, 0 profile/repo conflicts. Ran `--apply-safe` to additively append the 17 orphan index lines to `MEMORY.md`. At nearly the same moment, Mike on GURU-BEAST-ROG had completed a thoughtful consolidation pass (`0c00010` "chore(memory): consolidate scattered feedback/project/reference files") that took the store from 104 → 71 files: 19 syncro files into 3 rule files + 1 history file, per-cluster RULE/STATE/HISTORY split for GuruConnect/Dataforth/Cascades/GuruRMM, new `reference_resource_map.md` cheatsheet, MEMORY.md fully rewritten. Pull-rebase produced a merge conflict in MEMORY.md. Resolved by taking Mike's consolidated version (`git checkout --ours .claude/memory/MEMORY.md`) and discarding my orphan-fix index adds — every file my adds pointed at had been consolidated away on his side. Set-diff verified zero original lines lost. Re-ran dream against the consolidated state: 71 files, 0 orphans, 7 broken backlinks, **5 overlap clusters down from 12**. Skill confirmed working against the new layout but with a false-positive that needs fixing — it flags the new intentional `_history.md` companion files as merge candidates against their rule-file siblings. Broadcast `6c559209-a0bb-4007-ad01-cbf07deead1a` told the fleet about the consolidation, instructed each machine to `/sync` + re-dream locally, and warned about the false-positive merge proposals to ignore. Filed coord todo `5ad05d03-74ca-491d-9e72-3a699fcd1150` to refine the cluster heuristic.
|
||||
|
||||
**Side threads (smaller scope but real work):**
|
||||
- **Rednour Law M365 onboarding + Emma → Carla rename** earlier in the day (this session crossed from yesterday's tail into today's UTC midnight). Bootstrapped the full ComputerGuru MSP app suite for `rednourlaw.com` via Tenant Admin consent + `onboard-tenant.sh`; renamed `emma@` → `carla@rednourlaw.com` (Carla Skinner) with mail aliases preserved; added `smtp:nick@` alias on Nick Pafford's existing `npafford@` mailbox; Syncro ticket #32343 updated + 0.5h billed + marked Resolved.
|
||||
- **youtube-sync-docker pickup**: Mike asked to pull up the YouTube downloader project. Found it as a personal Gitea repo, cloned as a submodule. Read the codebase, found a real bug (Settings page wrote to `settings.json` but nothing downstream read it), fixed it with `apply_schedule()` helper + sync.sh/entrypoint.sh changes + 9 pytest cases across two commits. Code-reviewed both rounds.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- **Override removal: only after unit refresh.** Mike said "remove the override now that upstream is fixed", but inspection showed the agent binary was auto-updated today while the unit file on disk was still the buggy 2026-05-24 version. Removing the override alone would have regressed BUG-016 on this box. Caught that before acting and proposed refreshing the unit file first; Mike's intent was preserved by doing both steps together.
|
||||
- **Took ours on the MEMORY.md merge conflict.** During the rebase against Mike's `0c00010` consolidation, my `--apply-safe` orphan-fix additions were now stale (every file they referenced had been consolidated away). Took his version and discarded my adds rather than trying to reconcile per-line. Verified set-diff showed zero original content lost.
|
||||
- **`StateDirectory=gururmm` is the right systemd directive (preferred over `ReadWritePaths=/var/lib/gururmm`).** It auto-creates the dir with correct ownership, binds it rw in the unit's namespace, documents intent ("this service has persistent state"), and handles uninstall/reinstall cleanly. Spec recommended both options; upstream picked `StateDirectory` which matched my own preference.
|
||||
- **Cache device_id in `OnceLock<String>`, not `/etc/machine-id`.** The existing comment at `device_id.rs:7-10` explicitly rejected hardware IDs because OEMs ship machines with identical hardware IDs (un-sysprepped factory images). The OnceLock approach is the right shape — survives persist failure, doesn't depend on hardware ID.
|
||||
- **Memory-dream merge proposals stay advisory, never auto-applied.** The skill's `_history.md` false positives confirm the design choice that merges always go through human approval. Filed a heuristic-refinement todo so future reports stay actionable, but the skill is functionally correct as-is.
|
||||
- **Submodule identity reconcile uses Option A from the spec** (extend the existing init while-loop with `(cd ... && reconcile_git_identity ...)`) over Option B (inline duplicate logic in `submodule foreach`) or Option C (factor into a sourceable library). Empirically verified the heuristic catches real drift and is idempotent.
|
||||
- **Two youtube-sync-docker commits with wrong author** (`ef903c8`, `fdff0a7` authored as `ComputerGuru`) left as-is — rewriting history would need force-push to shared remote. The reconcile fix prevents recurrence on this and every other machine.
|
||||
- **Override at GURU-KALI removed cleanly at end of session**, replaced by the upstream-fixed base unit. Future agent reinstall would write this same shape — no drift.
|
||||
|
||||
## Problems Encountered
|
||||
|
||||
- **Initial Graph PATCH for Emma rename failed with `Property 'proxyAddresses' is read-only`.** Graph user write doesn't include `proxyAddresses` even with `Directory.ReadWrite.All`. Split the rename into two tiers: identity via Graph, mail aliases via Exchange REST.
|
||||
- **Exchange REST returned HTTP 403** even though the SP was consented. The Exchange Operator SP lacked Exchange Administrator role in the rednourlaw tenant. Resolved by running the full onboarding flow.
|
||||
- **Stale read-after-write on Exchange Set-Mailbox and Graph PATCH.** Both writes returned success codes immediately, but verification reads showed old data for ~45s. Polled for UPN convergence; converged within first/second attempt.
|
||||
- **sync.sh dead-submodule-ref failure** on routine pull. Manual workaround was `git -c submodule.recurse=false pull --rebase` etc.; fix made `--no-recurse-submodules` the default behavior.
|
||||
- **Coding Agent ran sync.sh as a verification step** during the submodule reconcile implementation, which auto-committed + pushed the dirty edit pre-Code-Review. Disclosed honestly by the agent. Code Review on the committed state came back CLEAN; accepted as-is.
|
||||
- **MEMORY.md merge conflict** during the memory dream collision with Mike's consolidation pass. Resolved by taking ours (Mike's intentional change) and discarding my now-stale orphan-fix adds.
|
||||
- **Auto-update refreshed agent binary but NOT systemd unit file.** Discovered when planning the override removal — the binary on disk was dated 20:24 today (auto-updated with the OnceLock fix) but the unit file was still dated 2026-05-24 (pre-fix template). Without manually refreshing the unit, the override removal would have re-broken BUG-016. Refreshed the unit explicitly before removing.
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
**ClaudeTools repo (committed across session):**
|
||||
- `.claude/scripts/sync.sh` — dead-submodule-ref tolerance, deleted dead Windows-path fallbacks, submodule identity reconcile in Phase 1a, coord_api read from identity.json with fallback. Multiple commits: `c89f22c`, `973e9db`, `4c49b85`.
|
||||
- `.claude/scripts/migrate-identity.sh` — populates `coord_api` for any machine missing the field (commit `973e9db`).
|
||||
- `.claude/scripts/check-messages.sh`, `check-ksteen-smartbadge.sh` — read `coord_api` from identity.json with fallback (commit `973e9db`).
|
||||
- `.claude/skills/remediation-tool/references/tenants.md` — rednourlaw.com row flipped NO → YES with role summary.
|
||||
- `clients/rednour/reports/2026-05-31-onboard-and-rename-emma-to-carla.md` — full M365 remediation audit report.
|
||||
- `docs/specifications/SUBMODULE-IDENTITY-RECONCILE-SPEC.md` — planning artifact.
|
||||
- `.gitmodules` — registered new submodule `projects/youtube-sync-docker`.
|
||||
- `.claude/memory/_reports/` — two dream reports (`2026-06-01-1525-dream.md`, `2026-06-01-1526-dream.md`).
|
||||
- Submodule pointers advanced: guru-rmm (BUG-016/017 fixes), guru-connect (multiple SPEC-004 tasks), youtube-sync-docker (settings fix + tests at `fdff0a7`).
|
||||
|
||||
**ClaudeTools machine-local (not committed; gitignored):**
|
||||
- `.claude/identity.json` — added `coord_api: "http://172.16.3.30:8001"` field, bumped `last_updated`.
|
||||
- `.claude/current-mode` — set to `dev` during youtube-sync-docker work.
|
||||
- All three submodules' local `.git/config` user.name/user.email reconciled to `Mike Swanson / mike@azcomputerguru.com`. `guru-connect` was previously unset (real drift case fixed by the new Phase 1a reconcile).
|
||||
|
||||
**gururmm repo (commits by Mike):**
|
||||
- `e3d6a46` — BUG-016 + BUG-017 entries in `docs/FEATURE_ROADMAP.md` (filed by me).
|
||||
- `30da053` — BUG-016 + BUG-017 fixes shipped (by Mike on another machine).
|
||||
- `2089e89` — bug roadmap status marked fixed.
|
||||
|
||||
**youtube-sync-docker repo (commits by Mike on this machine via Gitea Agent):**
|
||||
- `ef903c8` — settings-not-applied fix + 3 tests (note: authored as `ComputerGuru` due to pre-reconcile drift).
|
||||
- `fdff0a7` — apply_schedule tests + `.gitignore` python exclusions.
|
||||
|
||||
**GURU-KALI system (not version controlled):**
|
||||
- `/etc/systemd/system/gururmm-agent.service` — replaced with upstream-fixed template (gained `StateDirectory=gururmm`). Old version backed up as `gururmm-agent.service.pre-bug016-fix`.
|
||||
- `/etc/systemd/system/gururmm-agent.service.d/` — directory + `override.conf` removed (no longer needed).
|
||||
|
||||
## Credentials & Secrets
|
||||
|
||||
**rednourlaw.com (4a4ca18a-f516-478b-99da-2e0722c5dc18):**
|
||||
- Tenant Admin SP `671a2ace-be9e-440c-a7d6-5ff982e4500c` — Conditional Access Administrator
|
||||
- Security Investigator SP `704da463-7f4e-484c-b1da-40e447615d52` — Exchange Administrator
|
||||
- Exchange Operator SP `59a68ba9-5e1e-4a56-92ae-507a9a669a79` — Exchange Administrator
|
||||
- User Manager SP `dc3b79a2-638b-42fe-8ecb-51592db7d40f` — User Administrator + Authentication Administrator
|
||||
- Defender Add-on SP `052da8aa-1ca5-4f60-b9c5-7aafcb74264b` — no roles (no MDE in tenant)
|
||||
|
||||
**Users renamed/touched:**
|
||||
- `93074d1a-6db2-4794-8f7d-c84a619e4494`: emma@ → carla@rednourlaw.com (Carla Skinner). Sessions revoked, password unchanged.
|
||||
- `fe859088-bcbc-49dc-aaea-4c6e68f7d5bb`: npafford@ (Nick Pafford); added `smtp:nick@rednourlaw.com` alias.
|
||||
|
||||
**Syncro:**
|
||||
- Ticket #32343 (id 111409967): comments `415513323` (internal) + `415514647` (customer-visible); line item `42654682` (0.5h remote, $75.00, attributed to Mike user_id 1735). Status: Resolved.
|
||||
|
||||
## Infrastructure & Servers
|
||||
|
||||
- **GURU-KALI gururmm agent** post-fix state: PID `686646`, device_id `ec975630-d297-4df9-bcb5-a445c65b648d`, base unit `/etc/systemd/system/gururmm-agent.service` (refreshed today), no override drop-ins, mountinfo line 535 shows `/var/lib/gururmm` rw-bound via `StateDirectory=gururmm`.
|
||||
- **Coord API** still at `http://172.16.3.30:8001/api/coord` — now configurable per machine via `identity.json` `coord_api` field.
|
||||
- **rednourlaw.com tenant**: Global Admin is Carrie Rednour (also reachable via `sysadmin@rednourlaw.com`).
|
||||
- **gururmm server-side ghost-row purge complete** — 11 rows → 1 keeper (`agent_id 9bca5090-2d0e-40ad-9078-c11af8a435c0`).
|
||||
|
||||
## Commands & Outputs
|
||||
|
||||
```bash
|
||||
# Diagnostic that revealed process-scoped ro
|
||||
grep ' / ' /proc/$AGENT_PID/mountinfo
|
||||
# 447 404 259:3 / / ro,nosuid,relatime ... <- agent ns
|
||||
# Host's /proc/mounts and findmnt showed rw the whole time.
|
||||
|
||||
# Workaround applied early
|
||||
sudo tee /etc/systemd/system/gururmm-agent.service.d/override.conf > /dev/null <<'EOF'
|
||||
[Service]
|
||||
ReadWritePaths=/var/lib/gururmm
|
||||
EOF
|
||||
sudo systemctl daemon-reload && sudo systemctl restart gururmm-agent
|
||||
|
||||
# End-of-session: unit file refreshed to upstream-fixed template, override removed
|
||||
sudo cp -a /etc/systemd/system/gururmm-agent.service{,.pre-bug016-fix}
|
||||
# (wrote new unit with StateDirectory=gururmm)
|
||||
sudo rm -f /etc/systemd/system/gururmm-agent.service.d/override.conf
|
||||
sudo rmdir /etc/systemd/system/gururmm-agent.service.d
|
||||
sudo systemctl daemon-reload && sudo systemctl restart gururmm-agent
|
||||
|
||||
# Sync.sh runs
|
||||
bash .claude/scripts/sync.sh # multiple times, each pulling Mike's parallel work
|
||||
```
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
|
||||
- **Memory-dream cluster heuristic refinement** — coord todo `5ad05d03-74ca-491d-9e72-3a699fcd1150`, open. Either skip clusters containing `_history.md` files or honor frontmatter `merge_locked: true`.
|
||||
- **Shared-drive access for Nick Pafford** on Rednour ticket #32343 — deferred to a separate workflow per Mike's instruction.
|
||||
- **Other workstations need `migrate-identity.sh`** to pick up the new `coord_api` field. Broadcast sent; on-LAN machines work without it.
|
||||
- **Other workstations' submodule git identities** will auto-correct on next `/sync` (one-time warning per drifted submodule).
|
||||
- **Two youtube-sync-docker commits authored as `ComputerGuru`** — leaving history alone.
|
||||
- **TZ change via Settings UI still requires container restart on youtube-sync-docker** — tzdata locked in at process start. Not in scope to fix.
|
||||
- **Sync.sh's Phase 1a now skips submodule advance by default** (per Mike's later change on another machine); pass `--with-submodules` to fetch+advance. Already worked into the new sync.sh by Mike — no action.
|
||||
|
||||
## Reference Information
|
||||
|
||||
**Commits on the main ClaudeTools branch from this session (Mike, GURU-KALI):**
|
||||
- `c89f22c` — sync: dead-submodule-ref tolerance in sync.sh
|
||||
- `973e9db` — coord_api lift + identity.json + migrate-identity update + Windows-path cleanup
|
||||
- `4c49b85` — submodule identity reconcile in sync.sh Phase 1a
|
||||
- `14341d1` (or `c37fd11` post-rebase) — bundle: tenants.md flip + Rednour report + submodule reg + spec doc
|
||||
- `805b902` — youtube-sync-docker submodule pointer at `fdff0a7`
|
||||
- `633c3fc` — session log + final state
|
||||
- `805b902` (post-rebase to current HEAD) — completed
|
||||
|
||||
**Submodule HEADs at end of session:**
|
||||
- gururmm: `2089e89` (BUG-016/017 marked fixed; latest)
|
||||
- guru-connect: at the SPEC-004 Task 9 TOFU provisioning spec point
|
||||
- youtube-sync-docker: `fdff0a7` (settings fix + apply_schedule tests)
|
||||
|
||||
**Coord messages I sent today (GURU-KALI/claude-main):**
|
||||
- `1d93052f` — broadcast: alert routing change (initiated by GURU-5070, I just re-echoed)
|
||||
- (deprecated) coord-message about migrate-identity.sh
|
||||
- `99162698` — to Howard-Home/claude-main: BUG-016 + BUG-017 filed
|
||||
- `d91406ce` — to GURU-5070/claude-main: ghost-fix complete with stable device-id
|
||||
- `6c559209` — broadcast: memory consolidation + re-dream + ignore _history.md merge proposals
|
||||
|
||||
**Coord todos I created today:**
|
||||
- `a176100c-6de5-4e3b-8c1c-8291a2aa6ff0` — submodule identity reconcile in sync.sh (DONE)
|
||||
- `5ad05d03-74ca-491d-9e72-3a699fcd1150` — refine memory-dream cluster heuristic (open)
|
||||
|
||||
**M365 stable identifiers:**
|
||||
- rednourlaw tenant: `4a4ca18a-f516-478b-99da-2e0722c5dc18`
|
||||
- Carla user object: `93074d1a-6db2-4794-8f7d-c84a619e4494`
|
||||
- Nick user object: `fe859088-bcbc-49dc-aaea-4c6e68f7d5bb`
|
||||
|
||||
**GuruRMM stable identifiers:**
|
||||
- GURU-KALI agent (post-fix keeper): `agent_id 9bca5090-2d0e-40ad-9078-c11af8a435c0`, `device_id ec975630-d297-4df9-bcb5-a445c65b648d`
|
||||
|
||||
**Files of interest left for future sessions:**
|
||||
- `clients/rednour/reports/2026-05-31-onboard-and-rename-emma-to-carla.md` — full Rednour audit
|
||||
- `docs/specifications/SUBMODULE-IDENTITY-RECONCILE-SPEC.md` — written spec (now implemented)
|
||||
- `.claude/memory/_reports/2026-06-01-1525-dream.md` and `2026-06-01-1526-dream.md` — dream reports
|
||||
- `/etc/systemd/system/gururmm-agent.service.pre-bug016-fix` — backup of pre-fix unit on this machine (not in repo)
|
||||
|
||||
**Raw API artifacts (machine-local, not in repo):**
|
||||
- `/tmp/remediation-tool/4a4ca18a-f516-478b-99da-2e0722c5dc18/rednour-rename/` — pre/post Set-Mailbox + Get-Mailbox JSON for both Carla rename and Nick alias add
|
||||
Reference in New Issue
Block a user