sync: auto-sync from HOWARD-HOME at 2026-06-01 21:11:22

Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-06-01 21:11:22
This commit is contained in:
2026-06-01 21:11:29 -07:00
parent 6dee6406c0
commit 918afc5c26
3 changed files with 188 additions and 0 deletions

View File

@@ -127,3 +127,58 @@ Re-ran the corrected pipeline: 6118 raw -> 821 unique. Investigated high "Copies
- Scripts: .claude/tmp/treb-extract.ps1, treb-merge.ps1, treb-enhance.ps1
- Agent: ba173f0c-19e8-488d-834c-1b6f6dfd5699 (DESKTOP-QNP3ON5)
- Syncro #31953 (address book), customer 238740
---
## Update: 21:08 MST — contact data cleaning (names/emails/dedup) + local handoff
### Session Summary
Extended the contact recovery into a multi-pass data-cleaning effort after Howard verified the first output and found defects. Parsing notes into structured fields (treb-enhance.ps1, non-destructive) placed phones/addresses/emails but verification surfaced a malformed source: the contacts' `E-mail Address` field held junk single letters for 666/695 rows (real email scattered across Display Name / name fields / Notes), and the merge had keyed dedup on that junk. Rewrote treb-merge.ps1 around a unified identity model: per-record EMAILS (reconstructing an email split into First/Last like "jammerdavis737@gmail."+"com") and a BEST NAME (First/Last -> FullName/FileAs/Subject -> Display Name), with union-find dedup over shared email/name/derived-name signals (transitive = same person).
Re-extracted all 16 PSTs to capture FullName/FileAs/Subject (treb-extract.ps1 updated). Iterated the merge several times, each verification revealing another layer: handle-names (badgerbd) needed Display-Name fallback; email-as-name with no `@` in the First part needed TLD/trailing-dot rejection; truly nameless contacts now show by their clean email; email local-parts like Emily_Schroeder were derived into real names (role/no-reply addresses excluded); mangled "david."/"rystrom" recovered to "David Rystrom"; and derived-name + raw-name signals were fed back into dedup to collapse the last duplicates.
Howard proposed moving the data to Howard-Home for local iteration. Zipped the 15 `_work` JSONs + current FINAL into treb-contacts-data.zip (0.57 MB) on the Owner machine; Howard copied it to `C:\claudetools\.claude\tmp\treb-data\`. From there the merge/enhance ran LOCALLY via Windows PowerShell (path-swapped *-local.ps1 copies) with instant iteration and no RMM round-trips.
Final converged result: 6118 raw -> 674 unique contacts, 0 invalid emails, 0 same-name duplicate groups, CSV round-trips clean (674 rows / 48 cols) = import-ready. 532 first+last, 71 first-only, 71 email-only (no source name); 633 with email, 206 phone, 67 address, 272 notes (preserved verbatim throughout).
### Key Decisions
- Non-destructive enrichment (copy out of Notes, never delete) held throughout — Notes preserved on all 272.
- Unified identity model: EMAILS + BEST NAME functions feed BOTH dedup signals and output fields, so derived names also drive merging.
- Union-find dedup over shared signals (email / clean First+Last / spaced Display Name / email-derived name / raw mangled name) to collapse cross-copy AND cross-field duplicates while limiting false merges (name signal needs both parts).
- Email reconstruction priority: any `@`-bearing value across the four email fields + one rebuilt from an email split into the name. Junk single-letter Address values dropped.
- Name resolution priority: clean First+Last -> FullName/Subject/FileAs -> Display Name (incl single token) -> First.Last mangle recovery -> email-local-part derivation (role addresses excluded) -> show-by-email. Nickname variants (Dave/David) left for shared-email linking, not auto-merged.
- Moved raw JSONs to Howard-Home for local PowerShell iteration — eliminated RMM latency for the many merge passes.
### Problems Encountered
- Malformed source email field (junk single letters); real emails elsewhere. Fixed via multi-field reconstruction + email-as-name rebuild; verified 0 invalid emails.
- Over/under-merge churn: PowerShell single-element-array collapse ($re[0] -> [char]) fixed with @() + [string]; high copies-merged (to 39) confirmed legitimate (consistent identity), not over-merge.
- Email-as-name with no `@` in First (e.g. "jeffrey."/"com") accepted as a name until IsRealNamePair rejected TLD-last / trailing-dot.
- Same-person-different-key duplicates (email-only copy vs named copy) persisted until derived-name + raw-name signals were added to dedup -> 0 dup groups.
- A blocked Remove-Item (path guard) in a local verify command; dropped the cleanup step (timestamped files make newest unambiguous).
### Configuration Changes (this update)
- treb-extract.ps1: capture FullName/FileAs/Subject (_full/_fileas/_subject).
- treb-merge.ps1: full rewrite to unified identity model (EmailsOf, BestName, IsRealNamePair, SplitName, NameFromEmail/TitleCase, union-find Signals/Find/Union, name sanitize, notes concat, audit cols).
- treb-enhance.ps1: unchanged logic (phones/addresses/emails from Notes, non-destructive).
- Local working copies: .claude/tmp/treb-merge-local.ps1, treb-enhance-local.ps1 (path-swapped to C:\claudetools\.claude\tmp\treb-data).
- Data moved local: C:\claudetools\.claude\tmp\treb-data\_work\*.json (15) + FINAL CSV.
### Results / Deliverable (this update)
- FINAL (local): C:\claudetools\.claude\tmp\treb-data\AT-Trebesch-Contacts-FINAL-20260601-210118.csv — 674 contacts, import-ready, Outlook native headers + 3 audit cols (Source PSTs / Source folders / Copies merged; Outlook ignores on import).
- Source ZIP on Owner box: C:\Users\Owner\Desktop\Contacts\treb-contacts-data.zip.
### Pending / Incomplete (this update)
- DELIVERY: final CSV is on Howard-Home; the Outlook import target is the Owner box (DESKTOP-QNP3ON5). Need to push it back via RMM or Howard copies it over.
- OPTIONAL: clean copy without the 3 audit columns; Syncro #31953 time/resolution note.
- Residual (acceptable): 71 email-only contacts genuinely have no name in source (mostly businesses/role addresses), shown by email.
### Reference (this update)
- Local data + scripts: C:\claudetools\.claude\tmp\treb-data\, .claude/tmp/treb-merge.ps1 / treb-enhance.ps1 / treb-extract.ps1
- Pipeline: extract (Outlook COM, user_session) -> merge (union-find identity) -> enhance (notes->fields) -> verify