sync: auto-sync from GURU-5070 at 2026-06-11 08:29:58

Author: Mike Swanson
Machine: GURU-5070
Timestamp: 2026-06-11 08:29:58
This commit is contained in:
2026-06-11 08:30:10 -07:00
parent 65ad20ae0f
commit d0f90d4023
2 changed files with 35 additions and 0 deletions

View File

@@ -77,6 +77,21 @@ NEVER auto-applied.
These stay in the report's `## PROPOSED` section. The rationale isn't "never delete" any more (the fleet-wide additive safety net was dropped 2026-06-02; see `feedback_memory_sync_destructive_ok.md`) — it's that merges and dedups require human judgment about which file is canonical and how to combine content. Profile-side deletion DOES happen automatically — but in `sync-memory.sh`, not here. These stay in the report's `## PROPOSED` section. The rationale isn't "never delete" any more (the fleet-wide additive safety net was dropped 2026-06-02; see `feedback_memory_sync_destructive_ok.md`) — it's that merges and dedups require human judgment about which file is canonical and how to combine content. Profile-side deletion DOES happen automatically — but in `sync-memory.sh`, not here.
### The operator MUST execute consolidations — do not just propose and leave
The script is additive-only **by design** (auto-merging would corrupt the store — see the trap below), but the SESSION running `/memory-dream` is NOT additive-only. **Executing the `## PROPOSED` consolidations is the expected work of a dream run, not a someday-maybe.** Parking proposals indefinitely is exactly what causes the fleet drift this skill exists to prevent. Each run, after reading the report:
1. **Triage every `[MERGE?]` cluster with judgment** (the detector is a coarse net):
- **Intentional `X` + `X_history` (or `_archive`/`_detail`) splits are NOT duplicates** — current-state vs on-demand archive, cross-linked in frontmatter. Leave them. (The detector skips these as of 2026-06-11; older reports may still list them.)
- **Topically-clustered but distinct facts** (e.g. several `reference_gitea_*`) — merge into one topic file ONLY if genuinely redundant or it reads better as one; otherwise leave.
- **True duplicates / superseded files** — merge: pick the canonical file, fold in the others' unique content, `git rm` the retired ones, fix `MEMORY.md`.
2. **Deletions are first-class.** `git rm` of a retired memory is correct and propagates to every machine's profile store via `sync-memory.sh` mirror mode (repo authoritative). This is the fleet-consistency mechanism — use it.
3. **Commit + sync** so the consolidated store reaches the fleet.
If a cluster genuinely needs a decision you can't make, leave it AND say so explicitly in your summary — don't silently skip the whole PROPOSED section.
**The auto-merge trap (why the script stays additive):** blindly merging the detector's clusters destroys deliberate structure — verified 2026-06-11 when the flagged `project_*` / `project_*_history` pairs turned out to be intentional splits and the `gitea` / `syncro` clusters were distinct facts, not copies. Consolidation needs the operator's judgment; the script must never do it unattended.
## Running it ## Running it
This machine's Python launcher is `py` (per identity.json); the script also This machine's Python launcher is `py` (per identity.json); the script also

View File

@@ -432,6 +432,20 @@ def jaccard(a: set[str], b: set[str]) -> float:
return inter / union if union else 0.0 return inter / union if union else 0.0
# Suffixes that denote an INTENTIONAL current/archive split (e.g. project_cascades
# + project_cascades_history). These are deliberately separate files — current
# state vs on-demand detail, cross-linked in frontmatter — NOT duplicates. They
# must not be flagged as merge candidates.
ARCHIVE_SUFFIXES = ("_history", "_archive", "_detail", "_log", "_rationale")
def strip_archive_suffix(slug: str) -> str:
for suf in ARCHIVE_SUFFIXES:
if slug.endswith(suf):
return slug[: -len(suf)]
return slug
def cluster_overlaps(mems: list[Memory], threshold: float = 0.34): def cluster_overlaps(mems: list[Memory], threshold: float = 0.34):
""" """
Within each type, find pairs with token-overlap >= threshold, then union Within each type, find pairs with token-overlap >= threshold, then union
@@ -466,6 +480,7 @@ def cluster_overlaps(mems: list[Memory], threshold: float = 0.34):
parent[rx] = ry parent[rx] = ry
files = [m.filename for m in group] files = [m.filename for m in group]
slug_of = {m.filename: m.slug for m in group}
slug_prefix = {} slug_prefix = {}
for m in group: for m in group:
parts = m.slug.split("_") parts = m.slug.split("_")
@@ -480,6 +495,11 @@ def cluster_overlaps(mems: list[Memory], threshold: float = 0.34):
and len(slug_prefix[fi].split("_")) >= 2 and len(slug_prefix[fi].split("_")) >= 2
) )
if sim >= threshold or same_prefix: if sim >= threshold or same_prefix:
# Don't flag intentional current/archive splits (X + X_history):
# deliberately separate files, cross-linked in frontmatter, not dupes.
si, sj = slug_of[fi], slug_of[fj]
if si != sj and strip_archive_suffix(si) == strip_archive_suffix(sj):
continue
union(fi, fj) union(fi, fj)
groups: dict[str, list[str]] = {} groups: dict[str, list[str]] = {}