Files
claudetools/docs/specifications/SUBMODULE-IDENTITY-RECONCILE-SPEC.md
Mike Swanson c37fd11ee9 sync: auto-sync from GURU-KALI at 2026-05-31 19:31:53
Author: Mike Swanson
Machine: GURU-KALI
Timestamp: 2026-05-31 19:31:53
2026-05-31 19:31:56 -07:00

8.7 KiB

Spec: Submodule Identity Reconcile in sync.sh

Status: Proposed (not implemented) Authored: 2026-05-31 Author: Mike Swanson (drafted with Claude, GURU-KALI) Estimated effort: ~15 lines added to sync.sh, ~30 min including tests + verification

Problem

sync.sh already reconciles the parent repo's git config user.name / user.email against .claude/identity.json on every run (reconcile_git_identity, sync.sh:52-69). That guarantees the parent repo's commits are attributed to the human who actually owns this machine.

Submodules are not covered. A freshly-cloned submodule on a new machine inherits whatever the system default sets — typically <unix-user> <unix-user@hostname> on Linux, or the system git config on Windows. Submodules also don't pick up the parent repo's .git/config user — they have their own.

This bit us in this session: the first two commits in azcomputerguru/youtube-sync-docker from GURU-KALI landed as ComputerGuru <guru@GURU-KALI.lan> instead of Mike Swanson <mike@azcomputerguru.com>. The drift would silently repeat for every new submodule cloned on every new machine until each one was manually configured.

Same hazard applies to existing submodules projects/msp-tools/guru-rmm and projects/msp-tools/guru-connect on any workstation that hasn't been manually configured — Howard's home machine, the MacBook, GURU-5070, etc. Whether any of those are currently misattributed depends on whether the people who set them up happened to run git config user.name/email in each submodule. The git log of each submodule on each machine is the source of truth for the historical state, but the risk going forward is uniform: any new contributor or fresh clone gets default attribution.

Goal

When sync.sh runs Phase 1a (Submodule update), also call reconcile_git_identity inside each initialized submodule — the same way it does for the parent repo. After the change, every commit in every submodule from any team workstation is attributed to the human named in that machine's identity.json, regardless of when the submodule was first cloned.

Out of scope

  • Rewriting historical commits to fix prior misattribution. Too destructive (force-push to shared remote, breaks any in-flight branches). The two existing youtube-sync-docker commits with ComputerGuru author stay as-is unless explicitly requested separately.
  • Per-submodule user override (e.g. one submodule attributed to a different person on the same box). Not a real use case today — every machine has one human owner.
  • Reconciling other git config (signing keys, push default, etc.). Spec is scoped to author identity only.

Proposed change

In sync.sh Phase 1a, the existing git submodule foreach block (sync.sh:221-226) already iterates initialized submodules to advance them to their remote branch tip. Extend that same loop (or add a parallel one immediately after it) to invoke the reconcile inside each submodule directory.

The catch: reconcile_git_identity is a function defined in the parent sync.sh and references the parent's color variables. git submodule foreach runs each iteration in a fresh subshell whose only environment is what foreach exports. We can't directly call the bash function inside the foreach body — it won't be defined there.

Three options for routing the call:

Option A (recommended): Skip foreach for the reconcile; iterate .gitmodules paths directly the same way the existing init loop already does at sync.sh:200-218. That loop runs in the parent shell, so it has full access to functions and variables. Add a (cd "$ppath" && reconcile_git_identity "$USER_DISPLAY" "$USER_EMAIL") after the existing git submodule update --init -- "$ppath" line. Keep set +e / set -e discipline matching the surrounding code.

Option B: Export USER_DISPLAY and USER_EMAIL as env vars, and inline the reconcile logic inside the foreach body as a few lines of bash. Works, but duplicates the function's logic in a second place and is harder to maintain.

Option C: Stuff reconcile_git_identity into a separate sourceable file (.claude/scripts/lib/identity.sh) and source it inside the foreach body. Cleanest separation but adds a new file and a sourcing pattern that doesn't exist anywhere else in the repo today. Probably overkill for one helper function.

Go with Option A. Concrete patch sketch (inside the existing while-loop at sync.sh:200, immediately after git submodule update --init -- "$ppath"):

# Reconcile this submodule's git identity to match identity.json — same
# guarantee we make for the parent repo. Without this, commits in newly-cloned
# submodules land under the system default (e.g. "<unix-user> @<hostname>")
# instead of the actual human. Skip if identity.json was unreadable.
if [ "$USER_DISPLAY" != "unknown" ]; then
    (cd "$ppath" && reconcile_git_identity "$USER_DISPLAY" "$USER_EMAIL")
fi

USER_DISPLAY is already set by the time Phase 1a runs (the load happens earlier in the script). reconcile_git_identity is already defined in the parent shell scope and will see the same variables. The subshell with cd "$ppath" means git operates on the submodule's .git/config rather than the parent's.

Edge cases / risks

  1. Submodule not yet initialized (fresh clone). The init line right before this already populates the submodule; by the time the reconcile line runs, .git/config exists. Safe.
  2. Submodule path missing entirely (corrupt state, manual rm -rf). The init line would have errored — the existing code uses >/dev/null 2>&1 redirect to swallow it, so the cd would then fail and the subshell exits non-zero. Wrap in || true to be safe, matching the surrounding code's tolerance for submodule weirdness.
  3. identity.json unreadable (USER_DISPLAY == "unknown"). The same guard the parent-repo reconcile uses already covers this — skip silently rather than stamping the submodule with "unknown".
  4. Reconcile changes the submodule's git config and prints a [WARNING] line the first time it runs on each existing machine. That's correct and expected — it documents the drift correction. After the first run on a machine, idempotent.
  5. Submodule that points at a non-Gitea remote (no current example, but theoretically). The reconcile only writes git config locally; doesn't push or auth. No impact on remote interaction.
  6. reconcile_git_identity itself uses color variables (YELLOW, NC) that are defined at the top of sync.sh. The subshell created by (cd ...) inherits parent shell variables, so those color codes are still in scope. Verify.

Acceptance

  • bash -n .claude/scripts/sync.sh clean
  • Existing parent-repo reconcile behavior unchanged (sanity: run sync.sh on a box where parent git config already matches identity.json — no warning fires, no commit message about it)
  • On a workstation where any submodule's user.name / user.email doesn't match identity.json, sync.sh emits the [WARNING] line once per drifted submodule, then on the next run is silent (idempotent)
  • A test commit inside any submodule (from any machine) lands with the correct human author
  • No new shellcheck warnings (eyeball — no shellcheck mandated in current sync.sh tooling)
  • Cross-machine sanity: run on at least GURU-KALI, GURU-5070, HOWARD-HOME, Mikes-MacBook-Air, GURU-BEAST-ROG before treating as fully verified. Each machine's identity.json should already be authoritative.

Don't (when implementing)

  • Don't reconcile the vault repo via this mechanism — Phase 6 already does it. The change is scoped to project submodules only.
  • Don't add a --no-submodules flag or any user-facing toggle. The reconcile is always-on, just like the parent-repo reconcile is.
  • Don't reach for git config --global or --system. Local repo config only, same as the existing reconcile.
  • Don't write to identity.json from this code path. Identity is read-only here.

Estimated impact

  • Cost to ship: ~10 minutes of implementation + a quick test on this box (already showing the drift)
  • Cost of not shipping: every new submodule clone on every new machine has a roughly 50/50 chance of producing misattributed commits until somebody notices, the way it just happened here. Cost is low individually but accumulates as a noise source in git log and breaks contribution attribution for any reporting that aggregates by author.
  • The existing parent-repo reconcile: sync.sh:52-69 (reconcile_git_identity)
  • The submodule init/advance block this would hook into: sync.sh:175-228 (Phase 1a)
  • The drift incident that motivated this: youtube-sync-docker commits ef903c8 and fdff0a7 (2026-05-31) authored as ComputerGuru <guru@GURU-KALI.lan> instead of Mike. Not being rewritten; spec only prevents recurrence.