Files
claudetools/docs/specifications/SUBMODULE-IDENTITY-RECONCILE-SPEC.md
Mike Swanson c37fd11ee9 sync: auto-sync from GURU-KALI at 2026-05-31 19:31:53
Author: Mike Swanson
Machine: GURU-KALI
Timestamp: 2026-05-31 19:31:53
2026-05-31 19:31:56 -07:00

91 lines
8.7 KiB
Markdown

# Spec: Submodule Identity Reconcile in sync.sh
**Status:** Proposed (not implemented)
**Authored:** 2026-05-31
**Author:** Mike Swanson (drafted with Claude, GURU-KALI)
**Estimated effort:** ~15 lines added to `sync.sh`, ~30 min including tests + verification
## Problem
`sync.sh` already reconciles the **parent** repo's `git config user.name` / `user.email` against `.claude/identity.json` on every run (`reconcile_git_identity`, sync.sh:52-69). That guarantees the parent repo's commits are attributed to the human who actually owns this machine.
Submodules are not covered. A freshly-cloned submodule on a new machine inherits whatever the system default sets — typically `<unix-user> <unix-user@hostname>` on Linux, or the system git config on Windows. Submodules also don't pick up the parent repo's `.git/config` user — they have their own.
This bit us in this session: the first two commits in `azcomputerguru/youtube-sync-docker` from GURU-KALI landed as `ComputerGuru <guru@GURU-KALI.lan>` instead of `Mike Swanson <mike@azcomputerguru.com>`. The drift would silently repeat for every new submodule cloned on every new machine until each one was manually configured.
Same hazard applies to existing submodules `projects/msp-tools/guru-rmm` and `projects/msp-tools/guru-connect` on any workstation that hasn't been manually configured — Howard's home machine, the MacBook, GURU-5070, etc. Whether any of those are currently misattributed depends on whether the people who set them up happened to run `git config user.name/email` in each submodule. The git log of each submodule on each machine is the source of truth for the historical state, but the **risk going forward** is uniform: any new contributor or fresh clone gets default attribution.
## Goal
When `sync.sh` runs Phase 1a (Submodule update), also call `reconcile_git_identity` inside each initialized submodule — the same way it does for the parent repo. After the change, every commit in every submodule from any team workstation is attributed to the human named in that machine's `identity.json`, regardless of when the submodule was first cloned.
## Out of scope
- Rewriting historical commits to fix prior misattribution. Too destructive (force-push to shared remote, breaks any in-flight branches). The two existing youtube-sync-docker commits with `ComputerGuru` author stay as-is unless explicitly requested separately.
- Per-submodule user override (e.g. one submodule attributed to a different person on the same box). Not a real use case today — every machine has one human owner.
- Reconciling other git config (signing keys, push default, etc.). Spec is scoped to author identity only.
## Proposed change
In `sync.sh` Phase 1a, the existing `git submodule foreach` block (sync.sh:221-226) already iterates initialized submodules to advance them to their remote branch tip. Extend that same loop (or add a parallel one immediately after it) to invoke the reconcile inside each submodule directory.
The catch: `reconcile_git_identity` is a function defined in the parent `sync.sh` and references the parent's color variables. `git submodule foreach` runs each iteration in a fresh subshell whose only environment is what `foreach` exports. We can't directly call the bash function inside the `foreach` body — it won't be defined there.
Three options for routing the call:
**Option A (recommended):** Skip `foreach` for the reconcile; iterate `.gitmodules` paths directly the same way the existing init loop already does at sync.sh:200-218. That loop runs in the parent shell, so it has full access to functions and variables. Add a `(cd "$ppath" && reconcile_git_identity "$USER_DISPLAY" "$USER_EMAIL")` after the existing `git submodule update --init -- "$ppath"` line. Keep `set +e` / `set -e` discipline matching the surrounding code.
**Option B:** Export `USER_DISPLAY` and `USER_EMAIL` as env vars, and inline the reconcile logic inside the `foreach` body as a few lines of bash. Works, but duplicates the function's logic in a second place and is harder to maintain.
**Option C:** Stuff `reconcile_git_identity` into a separate sourceable file (`.claude/scripts/lib/identity.sh`) and `source` it inside the `foreach` body. Cleanest separation but adds a new file and a sourcing pattern that doesn't exist anywhere else in the repo today. Probably overkill for one helper function.
Go with Option A. Concrete patch sketch (inside the existing while-loop at sync.sh:200, immediately after `git submodule update --init -- "$ppath"`):
```bash
# Reconcile this submodule's git identity to match identity.json — same
# guarantee we make for the parent repo. Without this, commits in newly-cloned
# submodules land under the system default (e.g. "<unix-user> @<hostname>")
# instead of the actual human. Skip if identity.json was unreadable.
if [ "$USER_DISPLAY" != "unknown" ]; then
(cd "$ppath" && reconcile_git_identity "$USER_DISPLAY" "$USER_EMAIL")
fi
```
`USER_DISPLAY` is already set by the time Phase 1a runs (the load happens earlier in the script). `reconcile_git_identity` is already defined in the parent shell scope and will see the same variables. The subshell with `cd "$ppath"` means git operates on the submodule's `.git/config` rather than the parent's.
## Edge cases / risks
1. **Submodule not yet initialized** (fresh clone). The init line right before this already populates the submodule; by the time the reconcile line runs, `.git/config` exists. Safe.
2. **Submodule path missing entirely** (corrupt state, manual rm -rf). The init line would have errored — the existing code uses `>/dev/null 2>&1` redirect to swallow it, so the `cd` would then fail and the subshell exits non-zero. Wrap in `|| true` to be safe, matching the surrounding code's tolerance for submodule weirdness.
3. **`identity.json` unreadable** (`USER_DISPLAY == "unknown"`). The same guard the parent-repo reconcile uses already covers this — skip silently rather than stamping the submodule with "unknown".
4. **Reconcile changes the submodule's git config and prints a `[WARNING]` line** the first time it runs on each existing machine. That's correct and expected — it documents the drift correction. After the first run on a machine, idempotent.
5. **Submodule that points at a non-Gitea remote** (no current example, but theoretically). The reconcile only writes git config locally; doesn't push or auth. No impact on remote interaction.
6. **`reconcile_git_identity` itself uses color variables** (`YELLOW`, `NC`) that are defined at the top of `sync.sh`. The subshell created by `(cd ...)` inherits parent shell variables, so those color codes are still in scope. Verify.
## Acceptance
- `bash -n .claude/scripts/sync.sh` clean
- Existing parent-repo reconcile behavior unchanged (sanity: run sync.sh on a box where parent git config already matches identity.json — no warning fires, no commit message about it)
- On a workstation where any submodule's `user.name` / `user.email` doesn't match identity.json, sync.sh emits the `[WARNING]` line once per drifted submodule, then on the next run is silent (idempotent)
- A test commit inside any submodule (from any machine) lands with the correct human author
- No new shellcheck warnings (eyeball — no shellcheck mandated in current sync.sh tooling)
- Cross-machine sanity: run on at least GURU-KALI, GURU-5070, HOWARD-HOME, Mikes-MacBook-Air, GURU-BEAST-ROG before treating as fully verified. Each machine's identity.json should already be authoritative.
## Don't (when implementing)
- Don't reconcile the *vault* repo via this mechanism — Phase 6 already does it. The change is scoped to project submodules only.
- Don't add a `--no-submodules` flag or any user-facing toggle. The reconcile is always-on, just like the parent-repo reconcile is.
- Don't reach for `git config --global` or `--system`. Local repo config only, same as the existing reconcile.
- Don't write to `identity.json` from this code path. Identity is read-only here.
## Estimated impact
- Cost to ship: ~10 minutes of implementation + a quick test on this box (already showing the drift)
- Cost of not shipping: every new submodule clone on every new machine has a roughly 50/50 chance of producing misattributed commits until somebody notices, the way it just happened here. Cost is low individually but accumulates as a noise source in `git log` and breaks contribution attribution for any reporting that aggregates by author.
## Related
- The existing parent-repo reconcile: `sync.sh:52-69` (`reconcile_git_identity`)
- The submodule init/advance block this would hook into: `sync.sh:175-228` (Phase 1a)
- The drift incident that motivated this: youtube-sync-docker commits `ef903c8` and `fdff0a7` (2026-05-31) authored as `ComputerGuru <guru@GURU-KALI.lan>` instead of Mike. Not being rewritten; spec only prevents recurrence.