Files
claudetools/session-logs/2026-06-01-mike-guru-kali-ghost-fix-and-memory-dream.md
Mike Swanson 6dee6406c0 sync: auto-sync from GURU-KALI at 2026-06-01 20:34:26
Author: Mike Swanson
Machine: GURU-KALI
Timestamp: 2026-06-01 20:34:26
2026-06-01 20:34:27 -07:00

20 KiB

GURU-KALI Ghost-Churn Fix, BUG-016/017 Filing, Memory Dream + Consolidation Collision

User

  • User: Mike Swanson (mike)
  • Machine: GURU-KALI
  • Role: admin

Session Summary

Four substantive threads on GURU-KALI today, two of them tightly intertwined with parallel work happening on other workstations.

Thread 1 — GURU-KALI ghost-agent churn (full diagnosis + remediation + upstream fix lifecycle in one day). Coord message from GURU-5070 reported that GURU-KALI was minting ~10 ghost agent rows on the gururmm server, one ~daily. The initial diagnosis blamed a read-only root filesystem. Local check disproved that — findmnt -no OPTIONS / showed rw,relatime,errors=remount-ro on the host, no ext4 errors in the kernel log, no ro/rw transitions since the normal boot-time remount. The actual cause turned out to be gururmm-agent.service running with ProtectSystem=strict, which creates a private mount namespace where / is mounted ro for the service. The unit declared ReadWritePaths=/var/log /usr/local/bin /etc/gururmm but omitted /var/lib/gururmm where device_id.rs:get_persist_path() writes .device-id. Inside the agent's namespace, every persist attempt returned EROFS. Combined with a second bug (the agent regenerating a fresh UUID on every persist failure instead of caching in memory), this produced the ghost-row blizzard. Workaround applied: drop-in override at /etc/systemd/system/gururmm-agent.service.d/override.conf adding ReadWritePaths=/var/lib/gururmm. After daemon-reload + restart, the new agent persisted a stable device-id ec975630-d297-4df9-bcb5-a445c65b648d and zero EROFS warnings have logged since. Coord reply sent to GURU-5070 (d91406ce-c4ab-4914-b479-c1f4a948096f) — they purged the 11 ghost rows down to 1 keeper (agent_id 9bca5090-2d0e-40ad-9078-c11af8a435c0).

Thread 2 — Filed BUG-016 and BUG-017 in the gururmm roadmap, then both fixed upstream same-day. Wrote both bug entries into projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md with full root-cause, suggested fixes, and the GURU-KALI workaround. Notified Howard via coord (99162698-5439-4fcb-9c27-719a569a717c). Mike picked up both fixes on another workstation later in the day — 30da053 fix(agent): resolve Linux device_id persistence issues (BUG-016, BUG-017) shipped to gururmm/main, then 2089e89 docs(roadmap): mark BUG-016 and BUG-017 as fixed. Fix shape matched the spec recommendations exactly: unit template gained StateDirectory=gururmm (preferred over appending to ReadWritePaths), and device_id.rs:get_device_id() now uses OnceLock<String> to cache the first generated UUID even when persistence fails. Toward end of session, refreshed the GURU-KALI base unit to match the upstream-fixed template (replaced gururmm-agent.service with the new shape, removed the override drop-in, restarted) — backup of pre-fix unit saved as gururmm-agent.service.pre-bug016-fix. Verified device-id unchanged after restart, mountinfo line shows /var/lib/gururmm rw-bound via StateDirectory. The auto-update earlier in the day had refreshed the agent binary at 20:24 but NOT the unit file, so removing the override without refreshing the unit would have regressed BUG-016 on this box — caught that before acting.

Thread 3 — sync.sh hardening, three rounds across one day, and submodule identity reconcile. First round (dead-submodule-ref tolerance): a routine /sync failed because git fetch recursed into submodules and hit a transient dead ref in guru-connect history. Fix added --no-recurse-submodules to the parent fetch + pull and made the post-rebase git submodule update tolerant of per-submodule failures. Second round (coord_api lifted to identity.json): the hardcoded LAN IP http://172.16.3.30:8001 was identified in three scripts (sync.sh, check-messages.sh, check-ksteen-smartbadge.sh) — silently breaks off-LAN/VPN workstations. Lifted into .claude/identity.json as coord_api with the existing IP as fallback default; migrate-identity.sh updated to populate the field for any machine missing it. Broadcast 1d93052f-aa79-4ac3-a0e9-99f04a4695c9 told the team to run migrate-identity.sh. Dead Windows-path repo-root fallback loop at sync.sh:102 deleted. Third round (submodule identity reconcile): two youtube-sync-docker commits were authored as ComputerGuru <guru@GURU-KALI.lan> because sync.sh's reconcile_git_identity only ran on the parent repo. Wrote docs/specifications/SUBMODULE-IDENTITY-RECONCILE-SPEC.md, implemented the spec (10-line addition to Phase 1a — (cd "$ppath" && reconcile_git_identity ...) for each submodule). Empirically verified: caught real drift on this box's guru-connect submodule (unset identity → Mike Swanson), idempotent on re-runs, forced-drift test on youtube-sync-docker passed. Coord todo a176100c opened and closed in the same session.

Thread 4 — Memory dream skill collision with Mike's parallel consolidation. Tried the new memory-dream skill (landed via /sync earlier in the day). Default report-only run produced a clean report: 104 memory files, 17 orphan files needing index lines, 12 broken backlinks, 12 overlap clusters (biggest: 19 feedback_syncro_* files), 1 stale dated fact, 0 profile/repo conflicts. Ran --apply-safe to additively append the 17 orphan index lines to MEMORY.md. At nearly the same moment, Mike on GURU-BEAST-ROG had completed a thoughtful consolidation pass (0c00010 "chore(memory): consolidate scattered feedback/project/reference files") that took the store from 104 → 71 files: 19 syncro files into 3 rule files + 1 history file, per-cluster RULE/STATE/HISTORY split for GuruConnect/Dataforth/Cascades/GuruRMM, new reference_resource_map.md cheatsheet, MEMORY.md fully rewritten. Pull-rebase produced a merge conflict in MEMORY.md. Resolved by taking Mike's consolidated version (git checkout --ours .claude/memory/MEMORY.md) and discarding my orphan-fix index adds — every file my adds pointed at had been consolidated away on his side. Set-diff verified zero original lines lost. Re-ran dream against the consolidated state: 71 files, 0 orphans, 7 broken backlinks, 5 overlap clusters down from 12. Skill confirmed working against the new layout but with a false-positive that needs fixing — it flags the new intentional _history.md companion files as merge candidates against their rule-file siblings. Broadcast 6c559209-a0bb-4007-ad01-cbf07deead1a told the fleet about the consolidation, instructed each machine to /sync + re-dream locally, and warned about the false-positive merge proposals to ignore. Filed coord todo 5ad05d03-74ca-491d-9e72-3a699fcd1150 to refine the cluster heuristic.

Side threads (smaller scope but real work):

  • Rednour Law M365 onboarding + Emma → Carla rename earlier in the day (this session crossed from yesterday's tail into today's UTC midnight). Bootstrapped the full ComputerGuru MSP app suite for rednourlaw.com via Tenant Admin consent + onboard-tenant.sh; renamed emma@carla@rednourlaw.com (Carla Skinner) with mail aliases preserved; added smtp:nick@ alias on Nick Pafford's existing npafford@ mailbox; Syncro ticket #32343 updated + 0.5h billed + marked Resolved.
  • youtube-sync-docker pickup: Mike asked to pull up the YouTube downloader project. Found it as a personal Gitea repo, cloned as a submodule. Read the codebase, found a real bug (Settings page wrote to settings.json but nothing downstream read it), fixed it with apply_schedule() helper + sync.sh/entrypoint.sh changes + 9 pytest cases across two commits. Code-reviewed both rounds.

Key Decisions

  • Override removal: only after unit refresh. Mike said "remove the override now that upstream is fixed", but inspection showed the agent binary was auto-updated today while the unit file on disk was still the buggy 2026-05-24 version. Removing the override alone would have regressed BUG-016 on this box. Caught that before acting and proposed refreshing the unit file first; Mike's intent was preserved by doing both steps together.
  • Took ours on the MEMORY.md merge conflict. During the rebase against Mike's 0c00010 consolidation, my --apply-safe orphan-fix additions were now stale (every file they referenced had been consolidated away). Took his version and discarded my adds rather than trying to reconcile per-line. Verified set-diff showed zero original content lost.
  • StateDirectory=gururmm is the right systemd directive (preferred over ReadWritePaths=/var/lib/gururmm). It auto-creates the dir with correct ownership, binds it rw in the unit's namespace, documents intent ("this service has persistent state"), and handles uninstall/reinstall cleanly. Spec recommended both options; upstream picked StateDirectory which matched my own preference.
  • Cache device_id in OnceLock<String>, not /etc/machine-id. The existing comment at device_id.rs:7-10 explicitly rejected hardware IDs because OEMs ship machines with identical hardware IDs (un-sysprepped factory images). The OnceLock approach is the right shape — survives persist failure, doesn't depend on hardware ID.
  • Memory-dream merge proposals stay advisory, never auto-applied. The skill's _history.md false positives confirm the design choice that merges always go through human approval. Filed a heuristic-refinement todo so future reports stay actionable, but the skill is functionally correct as-is.
  • Submodule identity reconcile uses Option A from the spec (extend the existing init while-loop with (cd ... && reconcile_git_identity ...)) over Option B (inline duplicate logic in submodule foreach) or Option C (factor into a sourceable library). Empirically verified the heuristic catches real drift and is idempotent.
  • Two youtube-sync-docker commits with wrong author (ef903c8, fdff0a7 authored as ComputerGuru) left as-is — rewriting history would need force-push to shared remote. The reconcile fix prevents recurrence on this and every other machine.
  • Override at GURU-KALI removed cleanly at end of session, replaced by the upstream-fixed base unit. Future agent reinstall would write this same shape — no drift.

Problems Encountered

  • Initial Graph PATCH for Emma rename failed with Property 'proxyAddresses' is read-only. Graph user write doesn't include proxyAddresses even with Directory.ReadWrite.All. Split the rename into two tiers: identity via Graph, mail aliases via Exchange REST.
  • Exchange REST returned HTTP 403 even though the SP was consented. The Exchange Operator SP lacked Exchange Administrator role in the rednourlaw tenant. Resolved by running the full onboarding flow.
  • Stale read-after-write on Exchange Set-Mailbox and Graph PATCH. Both writes returned success codes immediately, but verification reads showed old data for ~45s. Polled for UPN convergence; converged within first/second attempt.
  • sync.sh dead-submodule-ref failure on routine pull. Manual workaround was git -c submodule.recurse=false pull --rebase etc.; fix made --no-recurse-submodules the default behavior.
  • Coding Agent ran sync.sh as a verification step during the submodule reconcile implementation, which auto-committed + pushed the dirty edit pre-Code-Review. Disclosed honestly by the agent. Code Review on the committed state came back CLEAN; accepted as-is.
  • MEMORY.md merge conflict during the memory dream collision with Mike's consolidation pass. Resolved by taking ours (Mike's intentional change) and discarding my now-stale orphan-fix adds.
  • Auto-update refreshed agent binary but NOT systemd unit file. Discovered when planning the override removal — the binary on disk was dated 20:24 today (auto-updated with the OnceLock fix) but the unit file was still dated 2026-05-24 (pre-fix template). Without manually refreshing the unit, the override removal would have re-broken BUG-016. Refreshed the unit explicitly before removing.

Configuration Changes

ClaudeTools repo (committed across session):

  • .claude/scripts/sync.sh — dead-submodule-ref tolerance, deleted dead Windows-path fallbacks, submodule identity reconcile in Phase 1a, coord_api read from identity.json with fallback. Multiple commits: c89f22c, 973e9db, 4c49b85.
  • .claude/scripts/migrate-identity.sh — populates coord_api for any machine missing the field (commit 973e9db).
  • .claude/scripts/check-messages.sh, check-ksteen-smartbadge.sh — read coord_api from identity.json with fallback (commit 973e9db).
  • .claude/skills/remediation-tool/references/tenants.md — rednourlaw.com row flipped NO → YES with role summary.
  • clients/rednour/reports/2026-05-31-onboard-and-rename-emma-to-carla.md — full M365 remediation audit report.
  • docs/specifications/SUBMODULE-IDENTITY-RECONCILE-SPEC.md — planning artifact.
  • .gitmodules — registered new submodule projects/youtube-sync-docker.
  • .claude/memory/_reports/ — two dream reports (2026-06-01-1525-dream.md, 2026-06-01-1526-dream.md).
  • Submodule pointers advanced: guru-rmm (BUG-016/017 fixes), guru-connect (multiple SPEC-004 tasks), youtube-sync-docker (settings fix + tests at fdff0a7).

ClaudeTools machine-local (not committed; gitignored):

  • .claude/identity.json — added coord_api: "http://172.16.3.30:8001" field, bumped last_updated.
  • .claude/current-mode — set to dev during youtube-sync-docker work.
  • All three submodules' local .git/config user.name/user.email reconciled to Mike Swanson / mike@azcomputerguru.com. guru-connect was previously unset (real drift case fixed by the new Phase 1a reconcile).

gururmm repo (commits by Mike):

  • e3d6a46 — BUG-016 + BUG-017 entries in docs/FEATURE_ROADMAP.md (filed by me).
  • 30da053 — BUG-016 + BUG-017 fixes shipped (by Mike on another machine).
  • 2089e89 — bug roadmap status marked fixed.

youtube-sync-docker repo (commits by Mike on this machine via Gitea Agent):

  • ef903c8 — settings-not-applied fix + 3 tests (note: authored as ComputerGuru due to pre-reconcile drift).
  • fdff0a7 — apply_schedule tests + .gitignore python exclusions.

GURU-KALI system (not version controlled):

  • /etc/systemd/system/gururmm-agent.service — replaced with upstream-fixed template (gained StateDirectory=gururmm). Old version backed up as gururmm-agent.service.pre-bug016-fix.
  • /etc/systemd/system/gururmm-agent.service.d/ — directory + override.conf removed (no longer needed).

Credentials & Secrets

rednourlaw.com (4a4ca18a-f516-478b-99da-2e0722c5dc18):

  • Tenant Admin SP 671a2ace-be9e-440c-a7d6-5ff982e4500c — Conditional Access Administrator
  • Security Investigator SP 704da463-7f4e-484c-b1da-40e447615d52 — Exchange Administrator
  • Exchange Operator SP 59a68ba9-5e1e-4a56-92ae-507a9a669a79 — Exchange Administrator
  • User Manager SP dc3b79a2-638b-42fe-8ecb-51592db7d40f — User Administrator + Authentication Administrator
  • Defender Add-on SP 052da8aa-1ca5-4f60-b9c5-7aafcb74264b — no roles (no MDE in tenant)

Users renamed/touched:

  • 93074d1a-6db2-4794-8f7d-c84a619e4494: emma@ → carla@rednourlaw.com (Carla Skinner). Sessions revoked, password unchanged.
  • fe859088-bcbc-49dc-aaea-4c6e68f7d5bb: npafford@ (Nick Pafford); added smtp:nick@rednourlaw.com alias.

Syncro:

  • Ticket #32343 (id 111409967): comments 415513323 (internal) + 415514647 (customer-visible); line item 42654682 (0.5h remote, $75.00, attributed to Mike user_id 1735). Status: Resolved.

Infrastructure & Servers

  • GURU-KALI gururmm agent post-fix state: PID 686646, device_id ec975630-d297-4df9-bcb5-a445c65b648d, base unit /etc/systemd/system/gururmm-agent.service (refreshed today), no override drop-ins, mountinfo line 535 shows /var/lib/gururmm rw-bound via StateDirectory=gururmm.
  • Coord API still at http://172.16.3.30:8001/api/coord — now configurable per machine via identity.json coord_api field.
  • rednourlaw.com tenant: Global Admin is Carrie Rednour (also reachable via sysadmin@rednourlaw.com).
  • gururmm server-side ghost-row purge complete — 11 rows → 1 keeper (agent_id 9bca5090-2d0e-40ad-9078-c11af8a435c0).

Commands & Outputs

# Diagnostic that revealed process-scoped ro
grep ' / ' /proc/$AGENT_PID/mountinfo
# 447 404 259:3 / / ro,nosuid,relatime ...  <- agent ns
# Host's /proc/mounts and findmnt showed rw the whole time.

# Workaround applied early
sudo tee /etc/systemd/system/gururmm-agent.service.d/override.conf > /dev/null <<'EOF'
[Service]
ReadWritePaths=/var/lib/gururmm
EOF
sudo systemctl daemon-reload && sudo systemctl restart gururmm-agent

# End-of-session: unit file refreshed to upstream-fixed template, override removed
sudo cp -a /etc/systemd/system/gururmm-agent.service{,.pre-bug016-fix}
# (wrote new unit with StateDirectory=gururmm)
sudo rm -f /etc/systemd/system/gururmm-agent.service.d/override.conf
sudo rmdir /etc/systemd/system/gururmm-agent.service.d
sudo systemctl daemon-reload && sudo systemctl restart gururmm-agent

# Sync.sh runs
bash .claude/scripts/sync.sh    # multiple times, each pulling Mike's parallel work

Pending / Incomplete Tasks

  • Memory-dream cluster heuristic refinement — coord todo 5ad05d03-74ca-491d-9e72-3a699fcd1150, open. Either skip clusters containing _history.md files or honor frontmatter merge_locked: true.
  • Shared-drive access for Nick Pafford on Rednour ticket #32343 — deferred to a separate workflow per Mike's instruction.
  • Other workstations need migrate-identity.sh to pick up the new coord_api field. Broadcast sent; on-LAN machines work without it.
  • Other workstations' submodule git identities will auto-correct on next /sync (one-time warning per drifted submodule).
  • Two youtube-sync-docker commits authored as ComputerGuru — leaving history alone.
  • TZ change via Settings UI still requires container restart on youtube-sync-docker — tzdata locked in at process start. Not in scope to fix.
  • Sync.sh's Phase 1a now skips submodule advance by default (per Mike's later change on another machine); pass --with-submodules to fetch+advance. Already worked into the new sync.sh by Mike — no action.

Reference Information

Commits on the main ClaudeTools branch from this session (Mike, GURU-KALI):

  • c89f22c — sync: dead-submodule-ref tolerance in sync.sh
  • 973e9db — coord_api lift + identity.json + migrate-identity update + Windows-path cleanup
  • 4c49b85 — submodule identity reconcile in sync.sh Phase 1a
  • 14341d1 (or c37fd11 post-rebase) — bundle: tenants.md flip + Rednour report + submodule reg + spec doc
  • 805b902 — youtube-sync-docker submodule pointer at fdff0a7
  • 633c3fc — session log + final state
  • 805b902 (post-rebase to current HEAD) — completed

Submodule HEADs at end of session:

  • gururmm: 2089e89 (BUG-016/017 marked fixed; latest)
  • guru-connect: at the SPEC-004 Task 9 TOFU provisioning spec point
  • youtube-sync-docker: fdff0a7 (settings fix + apply_schedule tests)

Coord messages I sent today (GURU-KALI/claude-main):

  • 1d93052f — broadcast: alert routing change (initiated by GURU-5070, I just re-echoed)
  • (deprecated) coord-message about migrate-identity.sh
  • 99162698 — to Howard-Home/claude-main: BUG-016 + BUG-017 filed
  • d91406ce — to GURU-5070/claude-main: ghost-fix complete with stable device-id
  • 6c559209 — broadcast: memory consolidation + re-dream + ignore _history.md merge proposals

Coord todos I created today:

  • a176100c-6de5-4e3b-8c1c-8291a2aa6ff0 — submodule identity reconcile in sync.sh (DONE)
  • 5ad05d03-74ca-491d-9e72-3a699fcd1150 — refine memory-dream cluster heuristic (open)

M365 stable identifiers:

  • rednourlaw tenant: 4a4ca18a-f516-478b-99da-2e0722c5dc18
  • Carla user object: 93074d1a-6db2-4794-8f7d-c84a619e4494
  • Nick user object: fe859088-bcbc-49dc-aaea-4c6e68f7d5bb

GuruRMM stable identifiers:

  • GURU-KALI agent (post-fix keeper): agent_id 9bca5090-2d0e-40ad-9078-c11af8a435c0, device_id ec975630-d297-4df9-bcb5-a445c65b648d

Files of interest left for future sessions:

  • clients/rednour/reports/2026-05-31-onboard-and-rename-emma-to-carla.md — full Rednour audit
  • docs/specifications/SUBMODULE-IDENTITY-RECONCILE-SPEC.md — written spec (now implemented)
  • .claude/memory/_reports/2026-06-01-1525-dream.md and 2026-06-01-1526-dream.md — dream reports
  • /etc/systemd/system/gururmm-agent.service.pre-bug016-fix — backup of pre-fix unit on this machine (not in repo)

Raw API artifacts (machine-local, not in repo):

  • /tmp/remediation-tool/4a4ca18a-f516-478b-99da-2e0722c5dc18/rednour-rename/ — pre/post Set-Mailbox + Get-Mailbox JSON for both Carla rename and Nick alias add