sync: auto-sync from GURU-5070 at 2026-06-02 07:25:49

Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-06-02 07:25:49
2026-06-02 07:25:55 -07:00
parent 13c7ad3c82
commit f8ed03c75a
54 changed files with 1349 additions and 2 deletions
--- a/.claude/memory/MEMORY.md
+++ b/.claude/memory/MEMORY.md
@@ -19,7 +19,7 @@
 - [Gitea API credential](reference_gitea_api_credential.md) — Gitea API (PRs/merges) as howard uses services/gitea-howard.sops.yaml password on internal http://172.16.3.20:3000; NOT the gururmm-server SSH password.
 - [Gitea Internal API Access](reference_gitea_internal.md) — git.azcomputerguru.com is NOT behind Cloudflare — it's the office Cox IP NAT'd to NPM (openresty) on Jupiter. Prefer internal 172.16.3.20:3000 for reliability (bypasses NPM SSL-renewal reload blips).
 - [Gitea git-op latency](reference_gitea_git_op_latency.md) — SSH (.20:2222) is SLOWEST (~1.5s); internal HTTP+token ~0.55s; SOPS lookup only ~0.33s. Don't switch to SSH for speed. Gitea SSH is .20:2222 (API ssh_url .21 is wrong).
- [GuruRMM technical reference](reference_gururmm.md) — Server (172.16.3.30) layout + API + `context=user_session` (WTS impersonation) + build-pipeline vendoring at `deploy/build-pipeline/` (auto-syncs to /opt/gururmm) + Linux agent systemd sandbox trap (ProtectSystem=strict makes fs/mount observations sandbox-local).
+- [GuruRMM technical reference](reference_gururmm.md) — Server (172.16.3.30) layout + downloads dir `/var/www/gururmm/downloads` + `.channel` sidecar rollout control (stable/beta) + privileged server access via the server's OWN root RMM agent (hostname `gururmm`, no SSH needed; plink fallback) + API + `context=user_session` (WTS impersonation) + build-pipeline vendoring at `deploy/build-pipeline/` + Linux agent systemd sandbox trap.
 - [Trebesch DESKTOP-QNP3ON5 shell replacement](reference_trebesch_qnp3on5.md) — AT Trebesch box runs an Explorer shell replacement; explorer.exe owner check returns blank — use Win32_ComputerSystem.UserName. GuruRMM SWIFT-LION-2892.

 ## Users
@@ -62,6 +62,7 @@

 ### GuruRMM
 - [GuruRMM operational rules](feedback_gururmm.md) — Six rules: (1) RMM dev = Mike, never Howard (368/0 commits); GuruScan is Howard's. (2) Agent parity Win+Linux+macOS in same change. (3) Builds via Gitea webhook pipeline only, never SSH. (4) #bot-alerts only for client/ticket impact, skip internal infra/dev. (5) Identify agents by IP, not by reconning candidates. (6) UNC paths in user_session need [char]92 — literals get halved.
+- [Build channel default = beta](feedback_gururmm_build_channel_default.md) — New agent builds must be tagged BETA by default (stable = explicit promote re-tag); distinct from agents defaulting to the stable CHANNEL (correct). Fixed build-windows/linux.sh 2026-06-01; macOS already correct. Enables beta-first canary.

 ### Cascades
 - [Cascades operational rules](feedback_cascades.md) — Two active rules: (1) folder redirection (fdeploy) needs subfolders PRE-CREATED before first logon or it caches a failure forever; recovery via fix-shell-redirect.ps1. (2) ALWAYS ask which security group(s) a new user goes into — never auto-derive from OU.
--- a/.claude/memory/feedback-rmm-unc-path-encoding.md
+++ b/.claude/memory/feedback-rmm-unc-path-encoding.md
@@ -0,0 +1,19 @@
+---
+name: feedback-rmm-unc-path-encoding
+description: RMM PowerShell UNC paths via user_session context lose one backslash when using string literals — must build with [char]92
+metadata:
+  type: feedback
+---
+
+Never use `"\\CS-SERVER\..."` string literals in PowerShell scripts dispatched via GuruRMM `user_session` context. The backslash gets halved somewhere in the encoding pipeline, producing `\CS-SERVER\...` (a local path) instead of the UNC `\\CS-SERVER\...`.
+
+**Why:** The `user_session` execution wrapper appears to process escape sequences in the script text differently than `system` context, stripping one backslash from `\\`.
+
+**How to apply:** Always build UNC paths explicitly when using user_session:
+```powershell
+$bs = [char]92
+$base = "${bs}${bs}CS-SERVER${bs}homes${bs}Username"
+```
+This constructs `\\CS-SERVER\homes\Username` correctly regardless of context.
+
+The `system` context (offline hive reg query) showed correct `\\CS-SERVER` output, so the issue is specific to `user_session`.
--- a/.claude/memory/feedback_cascades_folder_redirect.md
+++ b/.claude/memory/feedback_cascades_folder_redirect.md
@@ -0,0 +1,26 @@
+---
+name: feedback_cascades_folder_redirect
+description: Cascades folder redirection — fdeploy failure/retry behavior, correct new-user procedure, recovery script location
+metadata:
+  type: feedback
+---
+
+Folder redirection (fdeploy) caches failures and never retries if subfolders don't exist at first logon. "No changes detected" = stuck forever without manual intervention.
+
+**Root cause:** fdeploy1.ini had Flags=1211 which includes Grant Exclusive Rights (bit 0x400). The Homes share grants Domain Users=Change which excludes WRITE_DAC. fdeploy fails to set NTFS on new subfolders → logs 502 → caches the failure. Changed to Flags=187 in `{512B43A4-F049-4CE5-BFAC-860AD13E92BE}\User\Documents & Settings\fdeploy1.ini` on CS-SERVER.
+
+**Prevention — mandatory order for every new user:**
+1. Create AD user
+2. Run `New-HomeFolder -Username "<sam>"` on CS-SERVER — now creates root + Desktop/Documents/Downloads/Music/Pictures subfolders with correct ACL
+3. Add user to SG-FolderRedirect
+4. THEN first domain logon
+
+**Recovery (fdeploy already cached a failure):**
+- Run `clients/cascades-tucson/scripts/fix-shell-redirect.ps1` via GuruRMM on the client while user is logged in
+- Sets both GUID-based and legacy-name registry keys (Personal, My Music, My Pictures) in HKU\<SID>
+- Folders must already exist on server — script doesn't create them
+- User logs off and on to pick up changes
+
+**Why both GUID and legacy keys matter:** Downloads has no legacy name key → only GUID needed. Documents/Music/Pictures have both `{GUID}` AND `Personal`/`My Music`/`My Pictures`. Windows reads the legacy key for the actual shell folder — GUID alone is insufficient.
+
+**How to apply:** Any time a new Cascades user gets folder redirection set up.
--- a/.claude/memory/feedback_cascades_user_security_group.md
+++ b/.claude/memory/feedback_cascades_user_security_group.md
@@ -0,0 +1,12 @@
+---
+name: cascades-user-security-group
+description: When creating or adding any Cascades user, always ask which security group(s) the account goes into — deliberate decision, never auto-derived from OU
+metadata:
+  type: feedback
+---
+
+When creating, or being asked to create, any Cascades user account (AD or M365), always ask the user **which security group(s)** the new account should be a member of. Include it explicitly in the creation preview/confirmation alongside name, UPN, and OU — do not assume it from the OU, department, or job title.
+
+**Why:** Howard explicitly declined an `OU=Caregivers` -> `SG-Caregivers` auto-mirror script (2026-05-14). Security-group membership controls what access and Conditional Access policies apply to a user; he wants that to stay a deliberate, reviewed decision per user, not automated away. OU placement is mechanical (it controls Entra Connect sync scope); group membership is an access-control decision and must be made consciously.
+
+**How to apply:** During any Cascades user-creation flow, ask "which security group(s)?" and confirm it in the preview. For caregivers specifically: the account goes in `OU=Caregivers` (for sync scope) AND must be deliberately added to `SG-Caregivers` (for CA policy coverage) — two separate, intentional steps, neither auto-derived from the other.
--- a/.claude/memory/feedback_gururmm_agent_parity.md
+++ b/.claude/memory/feedback_gururmm_agent_parity.md
@@ -0,0 +1,16 @@
+---
+name: feedback_gururmm_agent_parity
+description: "Add feature X to the agent" means all three platforms (Windows + Linux + macOS) in the same change — no exceptions
+metadata:
+  type: feedback
+---
+
+"Add feature X to the agent" means Windows + Linux + macOS. All three in the same change.
+
+**Why:** Mike stated this explicitly 2026-05-15. Delivering Windows-only and leaving Linux/macOS for later is not acceptable — it's the same as not finishing the task.
+
+**How to apply:** When implementing any agent feature:
+- If the implementation differs by platform, write all three variants.
+- If a real implementation is not feasible on a platform yet, add a working stub + `// TODO(platform): <os> — <reason>` in the same commit.
+- A silent no-op without a stub and TODO is treated as a bug.
+- See `.claude/CODING_GUIDELINES.md` "GuruRMM Agent — Platform Parity" for the full matrix and known gaps.
--- a/.claude/memory/feedback_gururmm_build_channel_default.md
+++ b/.claude/memory/feedback_gururmm_build_channel_default.md
@@ -0,0 +1,15 @@
+---
+name: feedback_gururmm_build_channel_default
+description: GuruRMM build pipeline must tag NEW builds beta by default; stable is an explicit promote step. Distinct from agents defaulting to the stable CHANNEL (which is correct).
+metadata:
+  type: feedback
+---
+
+GuruRMM has two separate "stable" concepts that were conflated in a misunderstanding (corrected by Mike 2026-06-01):
+
+1. **Agent channel default = stable** — CORRECT and intended. `server/src/db` `resolve_agent_channel` returns "stable" when an agent/site/client has no `update_channel` override. Agents with no override should inherit the stable channel. Leave this as-is.
+2. **Build classification default = beta** — the FIX. New agent binaries must be tagged `beta` by default (a `<binary>.channel` sidecar = "beta"); promotion to `stable` is a deliberate, explicit re-tag after a canary verifies. The bug: `deploy/build-pipeline/build-windows.sh` and `build-linux.sh` tagged every new build `stable` ("Mark all new builds as stable by default"), which collapses the beta soak — `scanner.rs::get_latest_version` gives beta agents the absolute-latest binary and stable agents the latest stable-tagged one, and with `auto_update` on-by-default the whole stable fleet self-updates on reconnect. macOS already does it right (`agent/build-macos-pkg.sh` writes "beta").
+
+**Why:** enables beta-first canary rollout (e.g. soak a release on GURU-5070 before the ~46-agent fleet). Mike originally asked for "all agents on the stable channel by default", NOT "all builds classified stable".
+
+**How to apply:** when releasing GuruRMM agents, expect new builds to land on `beta` only. Promote to stable explicitly (re-tag the `.channel` sidecar to "stable" on the server downloads dir) after verifying on a beta box. See [[feedback_gururmm_builds]] (builds go through the Gitea webhook pipeline, never run build scripts by hand).
--- a/.claude/memory/feedback_gururmm_builds.md
+++ b/.claude/memory/feedback_gururmm_builds.md
@@ -0,0 +1,14 @@
+---
+name: feedback-gururmm-builds
+description: "GuruRMM builds must go through the Gitea webhook pipeline, never run manually via SSH"
+metadata: 
+  node_type: memory
+  type: feedback
+  originSessionId: 541d4004-8c45-4290-89f5-0ba9ee4e64a9
+---
+
+Never run `build-agents.sh` directly via SSH. All builds go through the normal Gitea webhook pipeline (push to main triggers the build automatically).
+
+**Why:** Manual runs execute as the SSH user (`guru`) instead of root, breaking log writes, artifact cleanup, and service restarts. The pipeline exists precisely to handle this correctly.
+
+**How to apply:** To trigger a build, push a commit to the gururmm main branch on Gitea. If a test build is needed without a real change, use an empty commit: `git commit --allow-empty -m "chore: trigger build"`.
--- a/.claude/memory/feedback_howard_delegation.md
+++ b/.claude/memory/feedback_howard_delegation.md
@@ -0,0 +1,12 @@
+---
+name: feedback-howard-delegation
+description: Howard prefers to leave backend/server-side follow-up and risky implementation work to Mike unless explicitly asked — don't assign those items to Howard or prompt him to do them.
+metadata:
+  type: feedback
+---
+
+Howard defers backend follow-up tasks (server-side plumbing, DB schema changes, agent-side wiring, anything touching infrastructure) to Mike by default. He said "I don't like messing with things."
+
+**Why:** Howard is a tech/field role. He's comfortable with dashboard UI, specs, and client work but prefers not to touch server/agent code unless Mike specifically asks him to.
+
+**How to apply:** When wrapping up an implementation that has follow-up backend items (e.g. "push rules on agent connect", "policy tab plumbing"), note them as "deferred to Mike" rather than listing them as Howard's pending tasks. Don't proactively suggest Howard implement server or agent changes unless he asks or Mike assigns them. See [[feedback-testing]] for related conservative approach to changes.
--- a/.claude/memory/feedback_no_botalerts_internal_rmm.md
+++ b/.claude/memory/feedback_no_botalerts_internal_rmm.md
@@ -0,0 +1,21 @@
+---
+name: feedback_no_botalerts_internal_rmm
+description: Post #bot-alerts ONLY when an RMM command directly affects a client endpoint or a ticket; skip for internal infra/build/dev/recon (e.g. PLUTO build-runner setup)
+metadata:
+  type: feedback
+---
+
+The `/rmm` skill instructs "post a one-line #bot-alert after every dispatch." Mike does NOT want
+#bot-alerts for **internal infrastructure / dev-tooling** commands — e.g. installing a Gitea Actions
+runner on PLUTO, CI/build orchestration on build VMs, inventory/recon during setup.
+
+**The rule (Mike, 2026-05-29):** post a #bot-alert ONLY when the RMM command **directly affects a
+client endpoint or a ticket** (remediation, a client machine change, ticket-linked work). For
+everything else — internal infra, build/CI orchestration, dev-tooling, recon/inventory (e.g. the
+PLUTO build-runner setup) — SKIP the alert.
+
+**Why:** keeps #bot-alerts signal-high — it's a client/ticket activity feed, not a build log.
+
+**How to apply:** When dispatching via `/rmm` or the GuruRMM command API, ask "does this touch a
+client/ticket?" If no, do NOT call `post-bot-alert.sh`. Overrides the skill's blanket "alert after
+every dispatch" rule. Related: [[reference_pluto_build_server]].
--- a/.claude/memory/feedback_no_indented_code_blocks.md
+++ b/.claude/memory/feedback_no_indented_code_blocks.md
@@ -0,0 +1,12 @@
+---
+name: feedback_no_indented_code_blocks
+description: Never indent code inside code blocks — Howard copy-pastes directly and leading spaces break PowerShell commands
+metadata:
+  type: feedback
+---
+
+Never indent code inside markdown code blocks. Howard copy-pastes commands directly from the chat and leading spaces cause PowerShell parse errors. All code must start at column 0 inside the fences.
+
+**Why:** Howard reported that indented code blocks consistently fail when pasted into PowerShell and he has to manually strip the indentation every time.
+
+**How to apply:** Every PowerShell (and bash/other) code block — start all lines at column 0, no leading spaces or tabs inside the fences.
--- a/.claude/memory/feedback_rmm_dev_is_mike.md
+++ b/.claude/memory/feedback_rmm_dev_is_mike.md
@@ -0,0 +1,15 @@
+---
+name: GuruRMM development is Mike's, not Howard's
+description: GuruRMM code/bugs/dev are Mike's domain — never route RMM dev or bug coord notes to Howard. Howard only SUBMITS RMM feature requests; GuruScan is Howard's project, not RMM
+type: feedback
+---
+
+GuruRMM development — code, bugs, the roadmap, architecture — is **Mike's** domain. Do NOT route RMM dev/bug coord messages to Howard. Howard does **zero** RMM coding.
+
+**Why:** Mike, 2026-05-26. I escalated a stale GuruRMM roadmap bug (BUG-001) to Howard via a coord note; Mike corrected me — "Howard hasn't done ANY code work on RMM." Verified: `users.json` machine lists don't overlap (mike: GURU-5070/Mikes-MacBook-Air/GURU-BEAST-ROG/GURU-KALI; howard: ACG-TECH03L/Howard-Home), and the GuruRMM repo has 368 commits by Mike and **0 by Howard**. The `/feature-request` skill encodes the real model: Howard *submits* RMM feature requests → Mike does the dev. I had inverted it.
+
+**How to apply:**
+- RMM bug/dev/roadmap item → it's Mike's. Since Mike is usually the user, just surface it to him directly; don't send a coord note to anyone (a note to yourself is pointless, and Howard isn't the owner).
+- **GuruScan** (`projects/msp-tools/guru-scan/`) IS Howard's project — coord notes about GuruScan correctly go to Howard. Don't conflate GuruScan with GuruRMM just because the names rhyme or GuruScan may integrate with RMM.
+- **Leave GuruScan alone until Howard asks.** Do NOT proactively review, audit, or modify its code — even after a sync pulls in big GuruScan changes. Wait for Howard to explicitly request a review. (Mike, 2026-05-27, after I offered to review Howard's GuruScan.psm1 refactor unprompted.)
+- Before sending any coord note to a teammate, check whose domain the work actually sits in. See [[user_howard]].
--- a/.claude/memory/feedback_rmm_identify_by_ip.md
+++ b/.claude/memory/feedback_rmm_identify_by_ip.md
@@ -0,0 +1,12 @@
+---
+name: feedback_rmm_identify_by_ip
+description: When the offending/target machine is known by external IP, identify the RMM agent by matching the IP — don't recon every candidate.
+metadata:
+  type: feedback
+---
+
+When a task names a machine by its external IP (e.g. an auth-failure source from a server log), identify the RMM endpoint by **matching that IP**, not by dispatching recon to every candidate agent and inspecting them.
+
+**Why:** Mike pushed back twice (2026-05-30) for probing both Pavon machines (Curves + Raiders) to find which had a stray GuruConnect client, when the offending external IP was already known. Matching IP is one lookup; reconning all candidates is noisy and slow.
+
+**How to apply:** Get the source IP from the relevant server's logs first. To map IP -> agent: GuruRMM does NOT yet store agent IPs (no local_ip/external_ip fields — see GuruRMM todo 7459428e, 2026-05-30), so until that lands, have only the *candidate* endpoints report their external IP (`Invoke-RestMethod ipify`) and match — or narrow candidates by site/client first. Once the server stamps external_ip from X-Forwarded-For, query `/api/agents` directly. Related: [[reference_gitea_internal]].
--- a/.claude/memory/feedback_syncro_appointment_date_check.md
+++ b/.claude/memory/feedback_syncro_appointment_date_check.md
@@ -0,0 +1,31 @@
+---
+name: Syncro — verify appointment date day-of-week
+description: Before creating any Syncro appointment, verify the computed date falls on the intended weekday (py datetime) and show the day name in the preview. Wrong-day incident #32312 2026-05-21.
+type: feedback
+---
+
+# Syncro — Verify appointment date day-of-week before creating
+
+**Rule:** Before creating any Syncro appointment, always verify that the computed date
+actually falls on the intended day of the week.
+
+**Why:** Day-of-week math is easy to get wrong. In the incident that prompted this rule
+(2026-05-21, ticket #32312), "Saturday" was computed as May 24 — which is actually a Sunday.
+The appointment landed on the wrong day and didn't appear where Winter expected it on the calendar.
+
+**How to verify:**
+
+Use Python or Bash to print the weekday before including it in the preview:
+
+```bash
+py -c "import datetime; d = datetime.date(2026, 5, 24); print(d.strftime('%A %Y-%m-%d'))"
+# Output: Sunday 2026-05-24  ← would have caught the error
+```
+
+Or include the day name in the TICKET PREVIEW and require explicit user confirmation
+that the day-of-week matches their intent.
+
+**Catch:** Always show `Day YYYY-MM-DD` (e.g., "Saturday 2026-05-23") in the preview —
+never just the numeric date — so the user can verify at a glance.
+
+Reported by Winter, 2026-05-21.
--- a/.claude/memory/feedback_syncro_appointment_owner.md
+++ b/.claude/memory/feedback_syncro_appointment_owner.md
@@ -0,0 +1,40 @@
+---
+name: Syncro — confirm appointment owner explicitly when creating tickets with appointments
+description: When creating Syncro tickets that include an appointment, always ask "who is the appointment owner?" before posting. Don't auto-default to the ticket's assigned tech, and distinguish owner from additional attendees.
+type: feedback
+---
+
+**Rule:** When creating a Syncro ticket that includes an appointment (Onsite, Remote, Phone Call, etc.), explicitly **ask the user who the appointment owner is** in the preview phase. Do not assume the appointment owner equals the ticket's assigned tech, and do not silently add other techs as attendees.
+
+**Why:** The appointment owner is the person whose calendar the appointment lands on as the primary entry — they are the one accountable for being there. Additional `user_ids` in the appointment payload only add the entry to other techs' calendars as secondary/visible items, which clutters their schedule and creates ambiguity about who is actually on the hook for the visit. Howard caught this on 2026-05-08 after a ticket creation where I added the assigned tech to `user_ids` without confirming whether they should be the owner versus an attendee.
+
+**How to apply:**
+
+In the ticket creation preview (Step 3 of the ticket creation workflow), present the appointment block with the OWNER as a separate, explicit field — not buried as an inferred default. Example preview format:
+
+```
+APPOINTMENT
+-----------
+Type:               Onsite
+Owner:              <ASK USER — who's calendar should this be on?>
+Additional attendees: (optional, leave blank unless explicitly added)
+Start:              <start_at>
+End:                <end_at>
+Location:           <blank or override>
+```
+
+In the API payload, the appointment owner is the FIRST or PRIMARY entry in `user_ids`. Confirm:
+
+- The owner is the person actually attending the appointment (or the lead tech if multiple).
+- If the user wants ONLY the owner with no co-attendees, `user_ids` should contain ONE id only.
+- If the user wants additional attendees (e.g., "Mike will join remote, Howard onsite"), add them only after explicit confirmation in the preview.
+
+**What NOT to do:**
+
+- Do NOT auto-add the ticket's `user_id` (assigned tech) as the appointment owner without asking.
+- Do NOT add additional attendees to `user_ids` without explicit user direction.
+- Do NOT treat appointment owner as a passive inheritance from the ticket — surface it as an active confirmation field in the preview.
+
+**Trigger context:**
+
+Howard created the Kittle Design ticket (#32263) on 2026-05-08 for an 11:30 AM onsite to set up Joshua. I auto-added Howard's `user_id` to the appointment's `user_ids` array without confirming whether Howard was the owner or just an attendee. Howard flagged: "when setting up an appointment confirm the appointment owner — don't just add additional attendees." Save as a rule for syncro ticket creation.
--- a/.claude/memory/feedback_syncro_blank_contact.md
+++ b/.claude/memory/feedback_syncro_blank_contact.md
@@ -0,0 +1,19 @@
+---
+name: Syncro — leave contact blank by default on tickets and billing
+description: When creating Syncro tickets or billing them out, leave the contact field blank ("Not Assigned") in most cases. Blank contact lets Syncro use the company-level defaults for notifications and email routing. Setting a specific contact can route to a secondary email and bypass the customer's intended distribution.
+type: feedback
+---
+
+**Rule:** When creating or billing Syncro tickets, leave `contact_id` / `contact_name` / `contact_email` blank ("Not Assigned") by default for any customer. Only set a contact when there's an explicit, deliberate reason to (e.g., user explicitly says "set the contact to X").
+
+**Why:** Winter clarified on 2026-05-04: blank contact lets Syncro apply the **company-level email defaults** for the account — those defaults route notifications to the right people. Setting a specific contact overrides that and may push notifications to a secondary email address belonging to that contact, bypassing the customer's intended distribution. This was originally flagged for Cascades of Tucson (where Meredith was being incorrectly auto-selected), but Winter generalized it: the rule applies to most customers.
+
+**How to apply:**
+
+- **Creating a ticket** (POST `/tickets`): Omit `contact_id` from the body entirely. Do not pull contacts via `GET /customers/{id}` and pick one — let Syncro use the company defaults.
+- **Editing a ticket** (PUT `/tickets/{id}`): Send only the fields you're changing (`status`, `priority`, etc.). Never include `contact_id`, `contact_name`, or `contact_email` in the body, even matching the existing value. PUT can re-apply the record; safest is to never reference contact in any write payload.
+- **Billing / invoices**: Same rule on the invoice creation side. If `contact_id` shows up in any payload, drop it.
+- **When to set a contact anyway:** Only if the user explicitly directs you to ("set Mike as the contact on this one") OR there's a documented per-customer instruction that overrides the default. Default is always blank.
+- **Verify after any write:** `GET /tickets/{id}` and confirm `.ticket.contact_id` is `null`. If you find it set, blank it explicitly: `PUT /tickets/{id}` with `{"contact_id": null}`.
+
+**Generalizes from:** the prior Cascades-specific guidance (originally `feedback_syncro_cascades_contact.md`). Winter's 2026-05-04 message broadened the scope from "Cascades only" to "most customers."
--- a/.claude/memory/feedback_syncro_cascades_contact.md
+++ b/.claude/memory/feedback_syncro_cascades_contact.md
@@ -0,0 +1,13 @@
+---
+name: Syncro — Cascades contact incident detail (Meredith Kuhn)
+description: Incident context for why the blank-contact rule matters at Cascades — Meredith Kuhn is the recurring wrong default that Syncro pre-selects. See feedback_syncro_blank_contact.md for the global rule.
+type: feedback
+---
+
+At Cascades of Tucson (customer_id 20149445), Syncro repeatedly pre-selects **Meredith Kuhn** (Assistant Manager, ASSISTMAN-PC) as the ticket contact default. She is the wrong contact — setting her overrides the customer's distribution emails and routes notifications only to her.
+
+**Why it keeps happening:** Syncro's contact picker defaults to the first-alphabetical or most-recently-used contact. Howard surfaced this pattern; Mike confirmed the global rule on 2026-05-24 (do not set contact on ANY ticket unless explicitly requested).
+
+**Global rule:** See [[feedback_syncro_blank_contact]] — blank contact is the default for all customers, not just Cascades.
+
+**Cascades-specific guard:** Even if you're tempted to assign a contact for routing purposes, Meredith Kuhn is specifically wrong. The correct routing happens automatically when contact is null.
--- a/.claude/memory/feedback_syncro_comment_dedup.md
+++ b/.claude/memory/feedback_syncro_comment_dedup.md
@@ -0,0 +1,20 @@
+---
+name: Syncro duplicate prevention — tickets AND comments
+description: Never retry ANY Syncro POST (ticket create or comment) without first GETting to confirm the action didn't already succeed — Syncro has no idempotency on any endpoint
+type: feedback
+originSessionId: 7034be43-1464-4085-b765-dc1226b1f8e0
+---
+Never retry a POST /comment to Syncro without first doing GET /tickets/{id} to confirm the comment did not already post. The server has no idempotency — one POST always creates one comment, regardless of whether the client saw an error.
+
+**ALSO: Always show the full comment draft to the user and wait for explicit confirmation before posting ANY comment — including internal/hidden notes.** This rule has been violated twice. There are no exceptions.
+
+**ALSO: This applies to ticket CREATION too — not just comments.** When a POST /tickets response looks wrong (null fields, jq error, etc.), do GET /customers/{id}/tickets BEFORE retrying. The response wrapper is `{"ticket": {...}}` — always use `.ticket.id` not `.id`. Duplicate tickets were created twice by retrying a succeeded POST. Violated 2026-04-22.
+
+**Why:** A comment was duplicated on ticket #32185 because the first POST succeeded but jq threw a parse error on the response (em-dash in subject caused shell interpolation issue), making the request look failed. A retry posted a second copy. Comments cannot be deleted via API — duplicates require manual GUI removal.
+
+**How to apply:**
+- Always write comment payloads to a temp file (`/tmp/syncro_comment.json`) before posting — avoids shell quoting/encoding failures that produce misleading errors
+- If any POST /comment tool call returns an error or ambiguous result, immediately GET /tickets/{id} and check `.ticket.comments` for the subject/timestamp before retrying
+- A jq parse error, curl error, or timeout on the response does NOT mean the POST failed — verify first
+- **CRITICAL — jq path:** POST /comment response is `{"comment": {...}}` — ALWAYS use `.comment.id`, `.comment.created_at` etc. Using `.id` returns null and looks like failure even when the comment landed. This caused a duplicate on 2026-04-23 (#32142). When GETting to verify, check ALL comments not just `[-3:]` — the new comment may not be the most recent if other activity occurred.
+- When GETting to verify after an ambiguous POST, search by subject: `.ticket.comments[] | select(.subject == "...")`
--- a/.claude/memory/feedback_syncro_content_type.md
+++ b/.claude/memory/feedback_syncro_content_type.md
@@ -0,0 +1,12 @@
+---
+name: feedback-syncro-content-type
+description: Syncro API POST calls require explicit Content-Type application/json header or they 400 with an HTML error page
+metadata:
+  type: feedback
+---
+
+Always include `-H "Content-Type: application/json"` on every Syncro API POST/PUT call (comments, tickets, line items, estimates).
+
+**Why:** Without it, curl sends the JSON body as `application/x-www-form-urlencoded`, which Syncro rejects with an HTML 400 page instead of a JSON error. The HTML response looks like a hard failure but it's just a missing header. Discovered 2026-05-28 when posting a comment to ticket #32333 — two 400 HTML responses before the fix.
+
+**How to apply:** Every `curl -X POST` or `curl -X PUT` to the Syncro API needs the header. The subject field is also required on ticket comments (`{"subject":"...","body":"...","hidden":true,"do_not_email":true}`).
--- a/.claude/memory/feedback_syncro_corrections_preserve_tech.md
+++ b/.claude/memory/feedback_syncro_corrections_preserve_tech.md
@@ -0,0 +1,18 @@
+---
+name: feedback-syncro-corrections-preserve-tech
+description: Preserve Syncro attribution — corrections keep the original tech's labor user_id (commission); and adding notes/labor never changes the ticket owner. Only reassign labor or ticket ownership when explicitly asked.
+metadata:
+  type: feedback
+---
+
+When fixing labor line items that were billed incorrectly (wrong product, quantity, name, or bad math — a **debug/correction action**), do NOT let the labor get reassigned to the correcting tech. **Preserve the ORIGINAL tech's attribution (`user_id`)** on each line so their commission isn't lost.
+
+- **Prefer `update_line_item` in place** — it preserves the line's existing `user_id`. (Verified on #32332: updating Howard's line kept `user_id=1750`; the dollar/product changed but the commission stayed with Howard.)
+- **If you must REMOVE + re-ADD a line**, the new line defaults to the **API-key owner's** `user_id` (e.g. Mike `1735`) — so explicitly set `user_id` to the original tech on `add_line_item`, or PUT `update_line_item` to fix the new line's `user_id` afterward.
+- Determine the original tech from the **ticket's `.ticket.user_id`** and the line's `.user_id` before correcting; verify it still matches after.
+
+**Tech user_ids:** Mike `1735`, Howard `1750`, Winter `1737`, Rob `1760`.
+
+**Ticket ownership (related rule, Mike 2026-05-27):** Simply adding notes or labor to a ticket does **NOT** change the ticket owner (`.ticket.user_id` / assigned tech). Multiple techs routinely work the same ticket. **Only change ticket ownership when explicitly asked** — never PUT a ticket's `user_id` as a side effect of commenting, billing, or status changes. (Status PUTs should send only `status`; line edits use `update_line_item`; neither should touch `user_id`.)
+
+**Why:** Mike — a billing correction is a debug action (e.g. Claude or someone billed it wrong); the **original tech still did the work and keeps the commission**. Don't take Howard's commission just because the math was fixed by Mike/Winter. Hit on #32332 (Cascades) 2026-05-27 — Howard's mis-billed labor was corrected via Mike's API key; update-in-place preserved `user_id=1750`, but a remove+add would have stolen the commission. Related: per-user-key attribution in [[365-remediation-tool-reference]] / `/syncro` Attribution rule.
--- a/.claude/memory/feedback_syncro_emergency_billing.md
+++ b/.claude/memory/feedback_syncro_emergency_billing.md
@@ -0,0 +1,22 @@
+---
+name: Syncro emergency/after-hours billing — check prepay_hours first
+description: Emergency labor is time-and-a-half (×1.5), applied once, never additive. Branch by customer.prepay_hours. Prepaid → emergency item 26184 at hours×1.5 (premium in quantity); non-prepaid → 26184 at actual hours (rate has 1.5×).
+metadata:
+  type: feedback
+---
+
+**Rule:** Before adding any Emergency/after-hours labor line on a Syncro ticket, `GET /customers/<id>` and read `prepay_hours`. Emergency = **time-and-a-half (×1.5), applied ONCE** — never bill a separate regular line + emergency line for the same hours.
+
+- **No prepaid block (`prepay_hours == 0`):** product `26184` (Labor - Emergency or After Hours) at quantity = **actual hours**, and set `price_retail` by the work's **delivery channel** (the 1.5× lives in the dollars — do NOT also ×1.5 the quantity): **Onsite emergency = $262.50** (175 × 1.5; this is 26184's default rate); **Remote / In-Shop emergency = $225** (150 × 1.5) → override `price_retail` to `225`. Fetch the base rate live and ×1.5 if unsure.
+- **Prepaid block (`prepay_hours > 0`):** product `26184` at quantity = **actual hours × 1.5** (hours + 50%). Prepaid blocks debit by QUANTITY not dollars, so the 1.5× premium goes in the **quantity**; the invoice nets to $0 and the block debits hours×1.5. e.g. 1.5 emergency hrs → `26184` @ **2.25**. (Delivery channel / dollar rate is **irrelevant** for prepaid — only the quantity hrs×1.5 matters.)
+
+**(Updated 2026-05-27 — Mike):** prepaid emergency now uses the **emergency item `26184`** at ×1.5 quantity — this REPLACES the old "prepaid → onsite `26118` at ×1.5." Using 26184 labels the line correctly as emergency and maps right in QuickBooks; the dollar double-1.5 worry doesn't apply to prepaid since the invoice is $0. Reaffirmed on #32332 (Cascades, prepaid 27h): total 1.5 emergency hrs → `26184` @ 2.25 (Howard had split it into made-up onsite/emergency lines).
+
+**Why ×1.5-not-additive:** Learned on #32203 (Desert Auto Tech) 2026-04-23 — billing "1h onsite + 1h emergency" as two additive lines came out $437.50 when 1 actual hour of emergency should bill at time-and-a-half. Emergency IS time-and-a-half; one line.
+
+**How to apply:**
+- Every emergency/after-hours bill: check `prepay_hours` BEFORE choosing the quantity. One emergency line on `26184`.
+- Always set `price_retail` explicitly (fetch live via `GET /products/26184`); the rate doesn't auto-populate and the line posts $0 if omitted.
+- Use the product's REAL name on the line (work detail goes in the description) — see [[feedback-syncro-no-madeup-labor-items]].
+- Verify after invoicing: `.invoice.total` (non-prepaid) or the prepay-block decrement (prepaid).
+- Full rules: `.claude/commands/syncro.md`.
--- a/.claude/memory/feedback_syncro_estimate_hardware.md
+++ b/.claude/memory/feedback_syncro_estimate_hardware.md
@@ -0,0 +1,12 @@
+---
+name: feedback_syncro_estimate_hardware
+description: Hardware line items on Syncro estimates always use product_id 32252 with varying name/price per item
+metadata:
+  type: feedback
+---
+
+All hardware on estimates uses a single generic product: `product_id: 32252` ("Hardware", `price_retail: 0.0`). The specific item name and cost are set per-line-item via the `name` and `price_retail` fields. Never search for individual hardware product IDs on estimates.
+
+**Why:** There is only one hardware product in Syncro. All hardware items are differentiated by description and price, not by product ID.
+
+**How to apply:** When building an estimate with hardware, always use `32252` as the product_id and set `name` to the specific item (e.g. "Dell OptiPlex 7010") and `price_retail` to the actual cost. Hardware is typically `taxable: true`.
--- a/.claude/memory/feedback_syncro_html.md
+++ b/.claude/memory/feedback_syncro_html.md
@@ -0,0 +1,17 @@
+---
+name: Syncro comment HTML formatting
+description: Use <br> for line breaks in Syncro comments, not <ul>/<li> — list tags don't render
+type: feedback
+originSessionId: b39e319c-ac3e-49f5-afb6-755e08f1fd82
+---
+Use `<br>` for line breaks in Syncro comment bodies. Do NOT use `<ul>`, `<li>`, or other block-level list tags — Syncro's renderer collapses them into a single line with no spacing.
+
+**Why:** Posted a comment with `<ul><li>` items and they all ran together on one line in the ticket view. Had to post a corrected duplicate.
+
+**How to apply:** For any bulleted list in a Syncro comment, use:
+```
+- Item one<br>
+- Item two<br>
+- Item three
+```
+wrapped in a `<p>` tag. Never use `<ul>/<li>`.
--- a/.claude/memory/feedback_syncro_labor_tax.md
+++ b/.claude/memory/feedback_syncro_labor_tax.md
@@ -0,0 +1,14 @@
+---
+name: feedback-syncro-labor-tax
+description: Labor is never taxable in Arizona — always set taxable=false on labor line items in Syncro
+metadata: 
+  node_type: memory
+  type: feedback
+  originSessionId: d91f202e-ddd5-46d7-b674-f848eb78aa8e
+---
+
+Always pass `"taxable": false` explicitly on labor line items via `add_line_item`.
+
+**Why:** Labor products are configured with `taxable: false` in Syncro, but the `add_line_item` API endpoint does not inherit the product's taxable setting — it posts the line item as `taxable: true` regardless of the product config.
+
+**How to apply:** Include `"taxable": false` in every `add_line_item` payload for labor products (remote, onsite, in-shop, emergency, prepaid). The product itself is correct; the API just doesn't carry it through.
--- a/.claude/memory/feedback_syncro_labor_type.md
+++ b/.claude/memory/feedback_syncro_labor_type.md
@@ -0,0 +1,24 @@
+---
+name: Syncro — use a billable labor type (in-shop / onsite / remote / web), never "Prepaid project labor"
+description: When billing Syncro tickets, the labor product on the line item MUST be one of in-shop, onsite, remote, or web labor. "Prepaid project labor" is an exempt labor type and will NOT draw down a customer's prepay block — using it silently breaks block-hour accounting.
+type: feedback
+---
+
+**Rule:** Line items on Syncro tickets must use a billable labor product matching the work delivery channel: **in-shop**, **onsite**, **remote**, or **web labor**. Do NOT use **"Prepaid project labor"** as the labor type for normal work.
+
+**Why:** Winter caught me on 2026-05-04 using "Prepaid project labor" by default. That product is **exempt** — it does not consume hours from a customer's prepaid block. So even if the ticket is for a prepay customer and looks billed correctly on the invoice, the block balance never decrements. Block-hour accounting silently drifts. Only the four non-exempt labor types (in-shop / onsite / remote / web) burn block time as intended.
+
+**How to apply:**
+
+- **Picking labor type:** Match it to how the work was actually delivered:
+  - **Remote labor** — work done over remote tools (RDP, Splashtop, ScreenConnect, phone-only support, scripts). This will be the most common pick.
+  - **Onsite labor** — work done at the client's physical location.
+  - **In-shop labor** — hardware brought to ACG's office for repair/build.
+  - **Web labor** — purely cloud/portal work (Microsoft 365 admin center, Entra, Cloudflare, etc.) where there's no remote-into-a-machine component. (Confirm with Winter if this distinction matters in your situation — sometimes "remote" is the right pick even for cloud work.)
+- **Resolving the product_id:** Use `GET /products?search=remote+labor` (etc.) to pull the right product_id for the labor type, then pass that as `product_id` on the `add_line_item` POST.
+- **Never default to "Prepaid project labor"** unless explicitly directed. If you find an existing entry with that product on a normal billable ticket, flag it — Winter (or whoever) will need to retroactively switch the labor type so the block decrement actually posts.
+- **Verifying:** After billing, check that the customer's prepay block balance dropped by the expected number of hours. If it didn't, the labor type was wrong.
+
+**Real-world incident — 2026-05-04:** Tickets I created on this date used "Prepaid project labor" as the auto-selected labor type. Winter is fixing them retroactively. Going forward, default to `Remote labor` for the typical remote-support ticket, then adjust per delivery channel.
+
+**Where this lands in skill code:** `.claude/commands/syncro.md` and the `syncro` skill workflow examples need to make labor-type selection an explicit step in the add_line_item billing workflow, not a silent default.
--- a/.claude/memory/feedback_syncro_line_items.md
+++ b/.claude/memory/feedback_syncro_line_items.md
@@ -0,0 +1,24 @@
+---
+name: feedback_syncro_line_items
+description: Correct Syncro API endpoint for adding labor/product line items to tickets
+metadata: 
+  node_type: memory
+  type: feedback
+  originSessionId: 282e0176-1bdb-49b7-8c15-faf152774d7e
+---
+
+Use `POST /api/v1/tickets/{internal_ticket_id}/add_line_item` to add line items to tickets. Both `name` and `description` fields are required (422 if either missing). Never use timers.
+
+**Why:** `/line_item`, `/line_items`, and PUT `line_items_attributes` all 404. The correct endpoint was found via Syncro Swagger spec at api-docs.syncromsp.com. Mike has explicitly said never use timers.
+
+**How to apply:**
+- Path uses internal ticket ID (e.g., 111387456), not ticket number (32339)
+- Required fields: `name`, `description`, `quantity`, `price`, `taxable` (and `product_id` if catalog item)
+- Response is a flat object — parse `.id` directly (not `.line_item.id`)
+- For testing/practice, use internal ACG account only (customer ID 15353550)
+
+Example:
+```
+POST /api/v1/tickets/111387456/add_line_item
+{"product_id":1049360,"name":"Labor- Warranty work","description":"...","quantity":1,"price":0.0,"taxable":false}
+```
--- a/.claude/memory/feedback_syncro_live_rates.md
+++ b/.claude/memory/feedback_syncro_live_rates.md
@@ -0,0 +1,18 @@
+---
+name: feedback-syncro-live-rates
+description: Always fetch Syncro labor rates live from the API — never use hardcoded rate table
+metadata: 
+  node_type: memory
+  type: feedback
+  originSessionId: d91f202e-ddd5-46d7-b674-f848eb78aa8e
+---
+
+Always fetch `price_retail` live from `GET /products/<id>` → `.product.price_retail` before billing any Syncro line item. Never use the rate table in the skill as a source of truth for dollar amounts.
+
+**Why:** The hardcoded rate table was proven wrong on 2026-05-20 (ticket #32304, Cascades) when Labor - Remote Business was listed at $150/hr but the correct rate was $175/hr. Rates vary by contract and change over time.
+
+**How to apply:** In any billing workflow, fetch the rate immediately after selecting the product_id:
+```bash
+RATE=$(curl -s "${BASE}/products/${PRODUCT_ID}?api_key=${API_KEY}" | jq -r '.product.price_retail')
+```
+Use this `$RATE` value for the Ollama draft prompt, the preview shown to the user, and the `price_retail` field in all payloads. The product ID table in the skill is still valid — just not the rate column.
--- a/.claude/memory/feedback_syncro_no_madeup_labor_items.md
+++ b/.claude/memory/feedback_syncro_no_madeup_labor_items.md
@@ -0,0 +1,12 @@
+---
+name: feedback-syncro-no-madeup-labor-items
+description: NEVER invent or rename Syncro labor line items — every labor line must use an existing product with its REAL name (from GET /products/<id>); work detail goes in the description field, not the name
+metadata:
+  type: feedback
+---
+
+Every labor line item on a Syncro ticket/invoice MUST be an **existing Syncro product, billed under its REAL name** (fetched from `GET /products/<id>` → `.product.name`) with the live `price_retail`. **NEVER make up a custom line-item name** — even when the `product_id` is a real product. The line's `name` field = the product's actual name, verbatim. Put any work-specific narrative in the `description` field, never by renaming the line.
+
+**Why:** Mike flagged ticket #32332 (Cascades — Chris Knight new-user setup), where product `26118` (real name **"Labor - Onsite Business"**) was billed on two lines as **"Emergency Call Setup"** and **"Onsite Computer Setup"** — fabricated names. Invented/renamed labor items break the **Syncro -> QuickBooks sync** — QB maps each labor line to an existing item, so a fabricated name has no QB match and messes up the accounting (Mike's stated reason). The **`description` field is free text and can be whatever the work needs** — only the `name`/product must be an existing Syncro item. Mike: "You CANNOT make up labor items. You MUST use existing items only for all labor items... the labor item must use the ones that already exist in syncro (otherwise it messes things up in Quickbooks)."
+
+**How to apply:** When adding ANY labor line — `GET /products/<id>`, copy `.product.name` verbatim into `name`, use `.product.price_retail` for `price_retail`, `taxable:false` for labor. Pick the correct EXISTING labor product (remote `1190473` "Labor - Remote Business" $150, onsite `26118` "Labor - Onsite Business" $175, emergency/after-hours `26184` "Labor - Emergency or After Hours Business" $262.50, in-shop `573881`, warranty `1049360`, etc. — full table in `/syncro`). Differentiate the work in `description`, not `name`. If no existing product fits the need, STOP and ask Mike — do not invent one. Related: [[feedback-syncro-live-rates]], [[feedback-syncro-warranty-product]].
--- a/.claude/memory/feedback_syncro_timer_first.md
+++ b/.claude/memory/feedback_syncro_timer_first.md
@@ -0,0 +1,18 @@
+---
+name: Syncro — use add_line_item for billing, not timers
+description: Syncro billing uses add_line_item directly. Timer workflow (timer_entry → charge_timer_entry) is not used. Overrides previous rule about timers being required.
+type: feedback
+---
+
+**Rule:** Bill Syncro tickets with `POST /tickets/{id}/add_line_item` directly. Do NOT use `timer_entry → charge_timer_entry`.
+
+**Why:** Mike confirmed 2026-05-21 that the timer workflow is not used. The previous rule requiring timers was wrong and caused repeated billing failures (wrong product on the timer, product_id silently ignored by charge_timer_entry, etc.).
+
+**How to apply:**
+
+- `add_line_item` is the billing path for all work: labor, warranty, internal, hardware.
+- Set `product_id`, `quantity` (decimal hours), `price_retail` (fetched live), `name`, `description`, `taxable: false`.
+- Do not create timer entries as part of billing.
+- Timer endpoints still exist in Syncro but are not part of the ACG billing workflow.
+
+**Previous rule (SUPERSEDED):** "All work-time billing MUST go through timer_entry → charge_timer_entry." That rule is no longer in effect as of 2026-05-21.
--- a/.claude/memory/feedback_syncro_timer_response_shape.md
+++ b/.claude/memory/feedback_syncro_timer_response_shape.md
@@ -0,0 +1,52 @@
+---
+name: Syncro — timer_entry response is FLAT, not wrapped
+description: POST /tickets/{id}/timer_entry returns a flat object {"id": N, "ticket_id": ..., "product_id": ..., ...}, NOT wrapped in {"timer": {...}} or {"timer_entry": {...}}. Parse as `.id`, never `.timer.id` — using the wrapped pattern silently returns null and creates duplicate timers when the script "retries".
+type: feedback
+---
+
+> **SUPERSEDED / HISTORICAL — 2026-05-21.** Timers are no longer part of the ACG Syncro
+> billing workflow; billing uses `add_line_item` directly. See [[Syncro — use add_line_item for billing, not timers]] (`feedback_syncro_timer_first.md`). Keep this note ONLY as reference
+> for the rare case a timer is created manually — do not treat it as current workflow.
+
+**Rule:** When parsing the response from `POST /tickets/{id}/timer_entry`, use `.id` directly — the response is a FLAT object. Do NOT use `.timer.id // .timer_entry.id`.
+
+**Verified response shape (2026-05-05, ticket #32253):**
+```json
+{
+  "id": 39031258,
+  "ticket_id": 109895882,
+  "user_id": 1750,
+  "start_time": "2026-05-05T09:00:00.000-07:00",
+  "end_time": "2026-05-05T09:30:00.000-07:00",
+  "recorded": false,
+  "billable": true,
+  "notes": "...",
+  "product_id": 26118,
+  "comment_id": null,
+  "ticket_line_item_id": null,
+  "active_duration": 1800,
+  "billable_time": 1800
+  ...
+}
+```
+
+**Why:** The skill doc at `.claude/commands/syncro.md` shows
+```bash
+TIMER_ID=$(echo "$TIMER_RESP" | jq -r '.timer.id // .timer_entry.id')
+```
+That fallback resolves to `null` because neither key exists on the flat response. A `null` TIMER_ID then breaks `charge_timer_entry` ("Not found"). If the script retries the timer_entry POST after the perceived failure, it creates a duplicate — Syncro has no idempotency. Hit this on ticket #32253 (Cascades) on 2026-05-05; created two duplicate 0.5hr timers and had to delete one via `delete_timer_entry` before charging.
+
+**How to apply:**
+
+- **Parsing:** Always `jq -r '.id'` on the timer_entry response.
+- **After ANY ambiguous timer_entry response** (null `.id`, jq error, network blip): GET the ticket and inspect `.ticket.ticket_timers[]` BEFORE retrying. Filter for `recorded: false` entries with the start/end times you just sent.
+- **Cleanup if duplicates exist:** `POST /tickets/{id}/delete_timer_entry` with `{"timer_entry_id": N}` for the older duplicate(s). Returns `{"success": true}`.
+- **Verifying the timer is on the ticket:** `GET /tickets/{id}` → `.ticket.ticket_timers` is the authoritative list. The standalone `/ticket_timers?ticket_id=N` query parameter does NOT filter by ticket — returns the entire global timer history.
+
+**Charge timer response is also flat:**
+```json
+{"id": 39031258, "recorded": true, "ticket_line_item_id": 42313052, ...}
+```
+Parse as `.ticket_line_item_id` to get the auto-generated line. Do not look for a wrapper.
+
+**Where this lands in skill code:** `.claude/commands/syncro.md` example block needs `.id` not `.timer.id // .timer_entry.id`. Until the skill is patched, override the example pattern when running.
--- a/.claude/memory/feedback_syncro_warranty_product.md
+++ b/.claude/memory/feedback_syncro_warranty_product.md
@@ -0,0 +1,22 @@
+---
+name: Syncro — warranty work uses the "Labor- Warranty work" product, never patch a billable product to $0
+description: For warranty/no-charge labor on Syncro tickets, use product_id 1049360 (Labor- Warranty work, $0/hr). Do NOT use a regular labor product with billable=false or a patched price_retail=0. Prices are determined by the product selected; never override the dollar amount to make one product behave like another.
+type: feedback
+---
+
+**Rule (two parts):**
+
+1. **Warranty / no-charge labor uses product `1049360` "Labor- Warranty work" ($0/hr, non-taxable).** Don't pick a regular Remote/Onsite/etc. labor product and try to neutralize it.
+
+2. **Prices are set by selecting the correct product. Never change `price_retail` on a line item to make a different labor product behave like a warranty (or any other) product.** If you find yourself reaching for `update_line_item` to drop a price, that's the signal to back up and pick a different `product_id` instead.
+
+**Why:** On 2026-05-06 (ticket #32225 Sombra Residential), I chose product `1190473` (Labor - Remote Business, $150/hr) for a follow-up warranty cleanup, set `billable: false` on the timer, and assumed the timer flag would zero the line. Syncro silently overrode `billable: false` and the resulting line came in at $75. I patched `price_retail` to $0 to "fix" it. Howard caught it: warranty work has a dedicated product in the dropdown, and patching dollar amounts is never how this is solved. The earlier guidance in `.claude/commands/syncro.md` (the "Warranty / no-charge → use closest labor product with billable=false" rule) was wrong; warranty has its own product just like Onsite, Remote, Emergency, etc., and that product is what should be used.
+
+**How to apply:**
+
+- **For any warranty / no-charge work:** `product_id = 1049360`, qty = actual hours, no need to patch the line — it generates at $0 because the product's `price_retail` is $0.
+- **The warranty product is $0 by design — don't fake a free line with flags.** Its `price_retail` is $0, so the line generates at $0 from `price_retail` × `quantity`. Do NOT take a regular labor product and try to neutralize it with `billable: false`; that was the original mistake (see Why — and Syncro silently overrode the flag in the timer era anyway). Pick `1049360`.
+- **Never reach for `update_line_item` to drop a price as a workaround.** If the dollar amount on a line is wrong, the wrong product was selected — undo, pick the correct product, redo. The only legitimate use of `update_line_item price_retail` is the Syncro auto-gen-zero recovery case (when the auto-line came in at $0 instead of the product's actual rate), and even that is a Syncro bug we're patching around, not a price-management tool.
+- **For the dropdown of available labor products,** see the rate table in `.claude/commands/syncro.md`. If the situation doesn't match any of those, ask before improvising.
+
+**Where this lands in skill code:** `.claude/commands/syncro.md` — added `1049360` to the labor product table, fixed the warranty branch in the billing workflow, and added an explicit "never patch price_retail to convert products" rule.
--- a/.claude/memory/gururmm-development-principles.md
+++ b/.claude/memory/gururmm-development-principles.md
@@ -0,0 +1,108 @@
+---
+name: GuruRMM Development Principles
+description: Every GuruRMM feature is full-stack (backend+API+UI+docs+scalability); product works without AI; the FEATURE_ROADMAP entry update is part of definition-of-done. Mirrors guru-rmm/docs/DESIGN.md.
+type: project
+---
+
+# GuruRMM Development Principles
+
+**Created:** 2026-04-29
+**Authority:** Mike Swanson (owner)
+**Location:** Documented in `projects/msp-tools/guru-rmm/docs/DESIGN.md`
+
+---
+
+## Holistic Feature Development (MANDATORY)
+
+When planning or implementing ANY GuruRMM feature, the complete stack must be considered and built:
+
+### Required Components for Every Feature:
+1. **Backend/Agent Logic** — core capability implementation
+2. **API Endpoints** — control and monitoring interfaces
+3. **UI/UX** — dashboard configuration, status display, management interface
+4. **Documentation** — user guides and operational docs
+5. **Scalability Design** — architected for future expansion
+
+### Example: Network Discovery Node
+A complete implementation includes:
+- Agent-side scanning capability (ICMP, ARP, SNMP)
+- Server-side data storage and API endpoints
+- Dashboard UI for:
+  - Designating which agent is the discovery node
+  - Viewing discovered devices
+  - Configuring scan schedules
+  - Setting IP ranges and exclusions
+- Status indicators (discovery progress, last scan time)
+- Future-proof data model supporting multiple discovery methods
+
+### Why This Matters:
+- **Completeness:** Features without UI are unusable by non-API-expert admins
+- **User Experience:** Configuration should be intuitive, not require documentation diving
+- **Consistency:** Every feature should feel native to the product
+- **No Dead Ends:** Design decisions shouldn't block obvious next steps
+
+**Features shipped without their UI/configuration interfaces are incomplete and will be rejected.**
+
+---
+
+## Living Roadmap (MANDATORY)
+
+`projects/msp-tools/guru-rmm/docs/FEATURE_ROADMAP.md` is the single living record of intent — where the product is going AND where it has been. It is a status-and-plan tracker, NOT a write-once backlog. Convention: `[ ]` = planned, `[x]` = shipped (annotate with date).
+
+**Consult it going in, update it coming out — the roadmap update is part of definition-of-done:**
+- **Before building:** read the feature's roadmap entry for intent/scope. New work that isn't on the roadmap gets an entry first.
+- **When shipping or modifying a feature:** update its roadmap entry in the SAME change — flip `[ ]`→`[x]` with a date, or revise/add the item. A code change that ships or alters a roadmap feature WITHOUT touching FEATURE_ROADMAP.md is incomplete (same standard as shipping without UI).
+- **Don't over-claim:** an entry's text must match what's actually built. If only part is done, keep `[ ]` and annotate the scope (e.g. "TCP probing shipped; ICMP/ARP/SNMP pending") rather than flipping.
+
+`/rmm-audit`'s roadmap pass is the **backstop** that reconciles drift — it is not the primary maintainer. Dev work keeps the roadmap honest; the audit catches what slipped. See [[feedback_rmm_dev_is_mike]] (RMM dev is Mike's).
+
+---
+
+## AI-Optional Operation
+
+GuruRMM must be fully functional without requiring AI agents (Claude, autonomous analysis tools) to operate.
+
+### Core Requirements:
+- All functionality accessible via traditional dashboard/API
+- Configuration and management through standard interfaces
+- Usable by MSP techs with zero AI/ML knowledge
+- Deterministic, reliable operation for production environments
+
+### AI Features Are Enhancements:
+- **Agentic analysis** (AI-powered log analysis, anomaly detection, troubleshooting) — planned enhancement
+- **Agentic command routing** (intelligent decision-making about command execution) — planned enhancement
+- Users choose whether to enable AI features
+- Product does not mandate AI usage
+
+### Why This Matters:
+- Real MSPs need deterministic, reliable systems
+- AI features can break, hallucinate, or be unavailable
+- Core operations cannot depend on AI availability
+- Production stability over experimental features
+
+---
+
+## Application to Development
+
+### When Adding Features:
+1. ✅ Design the complete stack before starting implementation
+2. ✅ Include UI mockups in feature planning
+3. ✅ Consider future expansion in data model design
+4. ✅ Ensure feature works via dashboard without API knowledge
+5. ✅ Never assume AI availability for core functionality
+
+### When Reviewing Features:
+1. ❌ Reject backend-only implementations without UI
+2. ❌ Reject features that require API expertise to configure
+3. ❌ Reject designs that paint into architectural corners
+4. ❌ Reject features that require AI to function
+
+### Planning Questions:
+- "How does an admin configure this in the dashboard?"
+- "What does the status display look like?"
+- "How do we expand this in v2/v3?"
+- "Does this work if AI services are unavailable?"
+
+---
+
+**These principles apply to ALL features — past, present, and future.**
--- a/.claude/memory/project-cascades-migration-plan.md
+++ b/.claude/memory/project-cascades-migration-plan.md
@@ -0,0 +1,20 @@
+---
+name: project-cascades-migration-plan
+description: Cascades of Tucson department migration plan — Syncro ticket, plan file location, resume command
+metadata:
+  type: project
+---
+
+Active multi-day migration project for Cascades of Tucson. Department-by-department domain join, folder redirection, and Entra sync rollout.
+
+**Why:** Full migration from workgroup/cloud-only to domain-integrated environment with clean end state (everything works automatically on fresh machine domain join).
+
+**Syncro ticket:** https://computerguru.syncromsp.com/tickets/110680053 — update with notes after each session.
+
+**Plan file:** `C:\Users\Howard\.claude\plans\wise-discovering-panda.md`
+
+[VERIFY 2026-05-26 — plan-file path C:\Users\Howard\... is machine-specific (Howard's box); confirm it resolves on ACG-TECH03L/Howard-Home or relocate the plan into the synced repo. Cascades m365-rollout still active/blocked.]
+
+**Resume command:** Howard says "resume the Cascades migration plan" → read plan file, check CURRENT SAVE POINT section, pick up at next unchecked item.
+
+**How to apply:** At every Cascades session start, read the plan file CURRENT SAVE POINT before doing any work. Update the save point and run /save at end of session.
--- a/.claude/memory/project_cascades_admin_accounts.md
+++ b/.claude/memory/project_cascades_admin_accounts.md
@@ -0,0 +1,16 @@
+---
+name: Cascades admin account ownership
+description: Howard uses sysadmin@cascadestucson.com, Mike uses admin@cascadestucson.com — used for daily admin work, not break-glass.
+type: project
+---
+
+At Cascades Tucson tenant (`207fa277-e9d8-4eb7-ada1-1064d2221498`):
+
+- **`sysadmin@cascadestucson.com`** — Howard's working admin account (used the PIM portal click on 2026-04-28 for the CA Admin role assignment).
+- **`admin@cascadestucson.com`** — Mike's working admin account.
+
+As of 2026-04-29, neither is confirmed as cloud-only / FIDO2 / CA-excluded — Howard "doesn't think they are cloud-only." A break-glass admin still needs to be designed before the CA bypass policies go live.
+
+**Why:** Avoid asking who owns which admin login again, and keep clear that these are *daily-driver* admin accounts, not the eventual break-glass.
+
+**How to apply:** When discussing Cascades admin work or break-glass design, attribute correctly. Don't assume sysadmin@ or admin@ already meet break-glass criteria — verify against Graph (onPremisesSyncEnabled, authentication methods, CA exclusions) before relying on either.
--- a/.claude/memory/project_cascades_billing.md
+++ b/.claude/memory/project_cascades_billing.md
@@ -0,0 +1,14 @@
+---
+name: project-cascades-billing
+description: "Cascades of Tucson Syncro billing — prepaid block customer, rate TBD"
+metadata: 
+  node_type: memory
+  type: project
+  originSessionId: d91f202e-ddd5-46d7-b674-f848eb78aa8e
+---
+
+Cascades of Tucson (Syncro customer_id: 20149445) is a prepaid block customer. As of 2026-05-20 the block had ~37.5 hrs remaining (38.5 minus 1hr for ticket #32304).
+
+**Block rate:** Not yet confirmed — $175/hr is the standard non-block remote rate, NOT the Cascades block rate. Ask Mike before billing future Cascades tickets.
+
+**How to apply:** Always check prepay_hours before billing. Invoices post at $0.00 with hours deducted by quantity. Confirm block rate with Mike before setting price_retail.
--- a/.claude/memory/project_cascades_ca_phased_rollout.md
+++ b/.claude/memory/project_cascades_ca_phased_rollout.md
@@ -0,0 +1,26 @@
+---
+name: Cascades CA bypass — phased per-group rollout, NOT tenant-wide
+description: Caregiver bypass CA policies are scoped to SG-Caregivers-Pilot only at start, then expanded one department at a time. Legacy all-users-MFA stays in place; we PATCH excludeGroups, never delete it during rollout.
+type: project
+---
+
+The Cascades caregiver bypass CA work is a **phased rollout**, not a tenant-wide policy swap. This corrects the original §5 design in `clients/cascades-tucson/docs/cloud/user-account-rollout-plan.md` and the resume-point in `2026-04-29-howard-cascades-bypass-pilot-phase-b-buildout.md`, which both implied a tenant-wide cutover.
+
+**What this means concretely:**
+
+- New CA policies target `SG-Caregivers-Pilot` only (then `SG-Caregivers` after Entra Connect exits staging). They do NOT use `includeUsers: All`.
+- The legacy `Require multifactor authentication for all users` policy **stays in place**. We PATCH its `excludeGroups` to add the pilot group, so existing office-staff behavior is unchanged.
+- Expansion to additional populations (front desk, clinical, admin staff) happens one group at a time post-pilot — each with its own scoped policy set, each by editing `excludeGroups` on the legacy policy and adding `includeGroups` to the relevant new policies.
+- The legacy all-users-MFA policy is ONLY deleted at the very end, when every population is governed by a phased policy.
+
+**Why:** Howard pulled the brakes on 2026-04-29 after spotting that policies #1, #2, #3 in the original design hit all users — would have blocked any office user signing in off-site who wasn't in `SG-External-Signin-Allowed`. The btw replay he pasted contained the correct rescoping: "Re-scope the new policies so they only target the pilot group initially, and roll out to other groups one at a time later." Phased preserves today's behavior for everyone except the pilot group while we validate the bypass mechanics.
+
+**How to apply:** When building or modifying Cascades CA policies, default to group-scoped (`includeGroups`), never `includeUsers: All`. When expanding to a new department, the steps are: (1) create the department's group, (2) PATCH legacy all-users-MFA to add it to `excludeGroups`, (3) add it to `includeGroups` on the relevant new policies. Treat any "let's just push it tenant-wide now that the pilot worked" suggestion as a regression of this decision and flag it.
+
+**Caregiver set (the only set in scope today):**
+- PATCH `Require multifactor authentication for all users`: add `SG-Caregivers-Pilot` to excludeGroups.
+- CREATE `CSC - Block caregivers off Cascades network` (includeGroups: pilot, locations: not Cascades, grant: BLOCK).
+- CREATE `CSC - Block caregivers on non-compliant device` (includeGroups: pilot, device filter isCompliant -eq False, grant: BLOCK).
+- CREATE `CSC - Caregiver sign-in frequency 8h` (includeGroups: pilot, session control: 8h re-auth).
+
+Note: for caregivers we use **Block** directly on non-compliant + off-network, not "Require MFA" — caregivers can't satisfy MFA (no personal device), so block is the cleaner UX. For non-caregiver populations later, MFA grants will likely be appropriate since office staff have MFA capability.
--- a/.claude/memory/project_cascades_pilot_cleanup.md
+++ b/.claude/memory/project_cascades_pilot_cleanup.md
@@ -0,0 +1,15 @@
+---
+name: Cascades caregiver pilot — cleanup obligations
+description: Pilot accounts (pilot.test@, howard.enos@ once synced) at Cascades must be removed at end of caregiver bypass pilot.
+type: project
+---
+
+The Cascades caregiver shared-phone bypass pilot (Path B, cloud-only) is using a temporary pilot identity. Howard explicitly flagged on 2026-04-29 that **all pilot artifacts must be cleaned up** when the pilot wraps:
+
+- **`pilot.test@cascadestucson.com`** — cloud-only test user created for the pilot. Delete (or disable + remove license) post-pilot.
+- **`howard.enos@cascadestucson.com`** — Howard's eventual synced identity (won't exist as a cloud user until Entra Connect exits staging). If used during pilot validation, also clean up after.
+- `SG-Caregivers-Pilot` cloud Entra group — superseded by synced `SG-Caregivers` group post-staging-exit. Remove pilot group from CA policy targets at that point; group itself can be deleted after.
+
+**Why:** Howard explicitly flagged on 2026-04-29 that pilot accounts must not stick around — clean tenant hygiene + license recovery (Business Premium seat returned to the 34-spare pool).
+
+**How to apply:** When the pilot validates and we transition to production rollout (synced `SG-Caregivers`), the cleanup of pilot.test, howard.enos pilot usage, and SG-Caregivers-Pilot is part of the cutover, not a separate task to forget. Surface this checklist when we get to the "flip pilot CA policies to production" step.
--- a/.claude/memory/project_dataforth_email.md
+++ b/.claude/memory/project_dataforth_email.md
@@ -0,0 +1,13 @@
+---
+name: Dataforth email infrastructure
+description: Dataforth uses M365 for email; the Exchange server on 172.16.x.x / neptune.acghosting.com is NOT Dataforth's — it belongs to ACG's own infrastructure
+type: project
+originSessionId: 7034be43-1464-4085-b765-dc1226b1f8e0
+---
+Dataforth's email runs on Microsoft 365 (sysadmin@dataforth.com, tenant in vault at `clients/dataforth/m365.sops.yaml`).
+
+The Exchange server at `neptune.acghosting.com` / `67.206.163.124` listed in the vault under `clients/dataforth/neptune-exchange.sops.yaml` is **not** part of Dataforth's infrastructure — do not use it for Dataforth email workflows.
+
+**Why:** Mike corrected this during pipeline notification work (2026-04-22). The Exchange entry is an ACG-side server, not Dataforth's.
+
+**How to apply:** For any Dataforth email sending, SMTP basic auth is disabled on the tenant. Must use OAuth2 — either XOAUTH2 over SMTP or (preferred) Microsoft Graph API `POST /v1.0/users/sysadmin@dataforth.com/sendMail` with a client_credentials token. Entra app is in vault at `clients/dataforth/m365.sops.yaml` under `credentials.entra-app`. Verify `Mail.Send` application permission is granted before use.
--- a/.claude/memory/project_dataforth_incident_2026-03-27.md
+++ b/.claude/memory/project_dataforth_incident_2026-03-27.md
@@ -0,0 +1,39 @@
+---
+name: Dataforth Security Incident 2026-03-27
+description: DF-JOEL2 compromised via ScreenConnect social engineering. MFA deployed. IC3 filed. C2 IPs blocked. Full remediation completed.
+type: project
+---
+
+[RESOLVED] CA policies enforced 2026-04-04; incident closed.
+
+## Incident
+Joel Lohr's workstation (DF-JOEL2, 192.168.0.143) compromised via phishing email to personal Yahoo account. Attacker "Angel Raya" deployed ScreenConnect C2 backdoors. M365 account also compromised from Turkey/UK/Germany.
+
+## Attacker
+- C2: 80.76.49.18 and 45.88.91.99 (AS399486, Virtuo, Montreal QC) - SUSPENDED by host
+- Cloud relay: instance-wlb9ga-relay.screenconnect.com
+- ConnectWise case: 03464184
+- IC3 complaint: 1c32ade367084be9acd548f23705736f
+
+## Remediation
+- C2 IPs blocked at UDM firewall (iptables - need permanent rules in UniFi UI)
+- 3 rogue ScreenConnect clients uninstalled
+- jlohr AD password reset, M365 sessions revoked
+- 32 machines scanned clean, 28 unreachable (offline)
+- No lateral movement detected
+
+## MFA Rollout
+- 3 CA policies deployed (report-only until April 4, 2026):
+  - Require MFA (skip from office IP 67.206.163.122)
+  - Block foreign sign-ins (US only, MFA-Travel-Bypass group for exceptions)
+  - Block legacy auth
+- 19/38 users MFA-ready, 19 need to register
+- MFA notice sent to all users, deadline April 4
+
+## Joel Lohr
+- Retiring March 31, 2026
+- Auto-reply directs contacts to Dan Center (dcenter@dataforth.com)
+- Account should be disabled after retirement
+
+**Why:** Active security incident requiring immediate response.
+**How to apply:** Monitor CA policies in report-only mode, enforce April 4. Check 28 offline machines when available. Add C2 IPs to permanent UDM block list.
--- a/.claude/memory/project_guruconnect_deploy.md
+++ b/.claude/memory/project_guruconnect_deploy.md
@@ -0,0 +1,54 @@
+---
+name: project_guruconnect_deploy
+description: How to deploy GuruConnect (v2+) to production — the server (172.16.3.30) builds its own Linux binary; gotchas with the systemd watchdog, trusted-proxy env, and auto-run migrations
+metadata:
+  type: project
+---
+
+GuruConnect v2 went live in production on 2026-05-30 (server + dashboard at v0.2.0,
+public at connect.azcomputerguru.com via NPM -> localhost:3002). The deploy is **manual**
+(the `.gitea/workflows/deploy.yml` "deploy to server" step is a stub that only builds a
+package artifact). Repo on the box: `/home/guru/guru-connect` (separate repo
+`azcomputerguru/guru-connect`, NOT a submodule there).
+
+**Build host = the server itself.** 172.16.3.30 has rust (rustup, cargo 1.94, the
+`x86_64-unknown-linux-gnu` target), node 20 + npm 10, and protoc (~/.local/bin, libprotoc 28.3)
+— all on PATH only in a **login shell** (`ssh guru@172.16.3.30 'bash -lc "..."'`; a
+non-interactive shell does NOT source ~/.profile so cargo/protoc look "missing"). GURU-5070
+builds the *Windows* agent + a windows-target server, NOT the Linux release — so build the
+Linux server ON the box. See [[reference_guru5070_rust_toolchain]].
+
+Deploy sequence (build while v1 runs, then a quick cutover restart):
+1. **Backup first:** `pg_dump "$DATABASE_URL" | gzip > ~/backups/guruconnect/pre-deploy-*.sql.gz`;
+   save the current commit + copy the running binary to `~/guruconnect-server.vN.bak`.
+2. Get the code: the server's local `main` may have **diverged** from origin (the v2 greenfield
+   respec rewrote history — `git pull --ff-only` will refuse). Tree is clean, so
+   `git fetch origin && git reset --hard origin/main` (rollback SHA is saved). `.env` is
+   gitignored, untouched.
+3. SPA: `cd dashboard && npm ci && npm run build` -> emits to `../server/static/app/` (gitignored).
+4. Binary (from repo ROOT, login shell, PROTOC set): `cargo build --release -p guruconnect-server
+   --target x86_64-unknown-linux-gnu`. `-p` scopes to the server so the Windows-only agent crate
+   isn't compiled; explicit `--target` overrides `.cargo/config.toml`'s windows-msvc default.
+   Output lands at `target/x86_64-unknown-linux-gnu/release/guruconnect-server` = the unit's ExecStart.
+   ~3 min. sqlx uses RUNTIME queries (no `query!` macros, no `.sqlx` cache) so the build needs no DB.
+5. **Cutover:** `sudo systemctl restart guruconnect`. Migrations are sqlx-embedded in the binary and
+   **auto-run on startup** (`db.migrate()`), so no manual `psql`. Watch
+   `journalctl -u guruconnect` for "Migrations complete" + "Server listening".
+
+GOTCHAS (all hit on the 2026-05-30 deploy):
+- **systemd unit:** the INSTALLED `/etc/systemd/system/guruconnect.service` has **no `WatchdogSec`**
+  (correct for v2, which sends no `sd_notify`). The repo's `server/guruconnect.service` DOES set
+  `WatchdogSec=30s` — so do NOT run `setup-systemd.sh` / copy the repo unit, or v2 restart-loops
+  every 30s. Unit: User=guru, EnvironmentFile=server/.env, WorkingDirectory=server/, ProtectSystem=strict.
+- **`CONNECT_TRUSTED_PROXIES`** is a v2 env var (comma-separated IPs; defaults to loopback fail-closed).
+  Public `connect.azcomputerguru.com` ingresses through **NPM on Jupiter (172.16.3.20)**, which forwards to
+  the relay on 172.16.3.30:3002. So set `CONNECT_TRUSTED_PROXIES=127.0.0.1,::1,172.16.3.20` in `server/.env`
+  (the Jupiter NPM hop, NOT the relay host .30 — that was a wrong first guess). Without trusting 172.16.3.20
+  the relay logs every public agent as 172.16.3.20 instead of reading X-Forwarded-For; with it, the real client
+  IP shows (verified: a Pavon agent logged its true public IP 98.172.64.243). Only `JWT_SECRET` is hard-required.
+- **NULL tags bug:** `connect_machines.tags` is `text[]` nullable with no default; v2 decodes it as
+  non-`Option`, so rows with NULL tags throw "unexpected null" at reconcile (and likely the Machines
+  list). Mitigated with `UPDATE connect_machines SET tags='{}' WHERE tags IS NULL`. Real fix is a
+  todo (decode Option + migration default).
+- DB is Postgres 14 `guruconnect` on localhost; existing users (admin, howard, both role admin)
+  survive migration. Rollback: `git reset --hard <saved-sha>`, rebuild, restart, `psql < backup`.
--- a/.claude/memory/project_guruconnect_v2_direction.md
+++ b/.claude/memory/project_guruconnect_v2_direction.md
@@ -0,0 +1,32 @@
+---
+name: project_guruconnect_v2_direction
+description: GuruConnect v2 modernization direction (Mike, 2026-05-29) — native-first full key fidelity + bidirectional file cut/paste/drag are the headline must-haves; WebRTC is fallback only
+metadata:
+  type: project
+---
+
+GuruConnect is being re-architected (v2) after the 2026-05-29 audit found 3 CRITICAL relay-plane
+auth holes. Direction set by Mike (product owner), captured in
+`projects/msp-tools/guru-connect/docs/specs/SPEC-002-v2-modernization-architecture.md`:
+
+- **Greenfield, salvage cores:** keep the proven Rust (DXGI/GDI capture, input injection, SAS
+  helper, prost codec, proto, Gitea-Actions CI) — rebuild relay/auth, session, viewer, dashboard,
+  deploy. Clean reset in-place (keep repo/history/issues), not a new repo.
+- **Native-first, NOT WebRTC.** Mike's favorite ScreenConnect features and explicit priorities:
+  (1) **keyboard hooks / full key fidelity** — Win+R, Ctrl+C/V, **Ctrl+Alt+Del** must work (browsers
+  structurally can't do these, which is why WebRTC is fallback/secondary only); (2) **bidirectional
+  file transfer via clipboard cut/paste AND drag-and-drop** from/to either guest or host. Both are
+  core differentiators, not deferred. Transport stays custom protobuf-over-WSS.
+- **Standalone-first + versioned `/api/integration/v1/` contract** with GuruRMM (ADR-001; the
+  `specs/native-remote-control/` work is the integration prior art).
+- **Hardened single-tenant now, multi-tenancy-READY schema** (nullable `tenant_id` everywhere) so
+  the RMM partner/client model switches on later with no migration rewrite.
+- Adopt GuruRMM principles: per-agent keys (Option 3, kill shared AGENT_API_KEY), no-TOML-for-
+  endpoints, living-roadmap = definition-of-done, full-stack features, true-integration/anti-Datto.
+
+**Why:** initial GC was built with a much older model; lots of debt. Mike is willing to scrap v1
+entirely for a considerably better product. **How to apply:** when building GC features, default to
+native full-fidelity behavior and ship each capability full-stack (proto+agent+server+viewer+
+dashboard+docs). File transfer needs delayed-render clipboard + a chunked engine; drag-out
+(remote→local) is the hard case, ships after drag-in. Re-spec keystone: [[project_versionable_products]].
+Open questions still pending Mike's answer: repo reset, H.264-vs-HEVC default, web transport, support-code format, v1 cutover.
--- a/.claude/memory/project_mac_gururmm_setup_pending.md
+++ b/.claude/memory/project_mac_gururmm_setup_pending.md
@@ -0,0 +1,28 @@
+---
+name: Mac gururmm hook setup pending
+description: Mikes-MacBook-Air needs install-hooks.sh run in gururmm repo — one-time setup to prevent sqlx migration drift
+type: project
+---
+
+[CONFIRMED PENDING 2026-05-27 — Mac session verified install-hooks.sh has NOT been run on Mikes-MacBook-Air: the gururmm submodule is initialized but only default .sample hooks are present. Action genuinely still open — keep this file; run install-hooks.sh on the Mac before gururmm dev there.]
+
+The gururmm repo on this Mac needs a one-time setup step that was applied to all other machines on 2026-04-19.
+
+**Action required (do this now, before any gururmm work):**
+
+```bash
+cd /Users/azcomputerguru/ClaudeTools/projects/msp-tools/guru-rmm
+git pull
+bash scripts/install-hooks.sh
+```
+
+**What this does:**
+- Sets `core.hooksPath = scripts/hooks/` so the pre-commit CRLF check is active
+- Sets `core.autocrlf=false` and `core.eol=lf` locally and globally
+- Prevents sqlx migration checksum drift (root cause: CRLF vs LF sha384 mismatch)
+
+**Why:** The gururmm build server refused to start after a rebuild because migration file hashes differed between what was stored in `_sqlx_migrations` and the current files. Root cause was CRLF line endings from Windows commits. Fixed with `.gitattributes` + per-machine git config. This command applies the git config side.
+
+macOS defaults to LF, so this is low-risk — mainly sets the hooksPath so the pre-commit guard is active.
+
+**After running:** Delete this memory file or mark it resolved.
--- a/.claude/memory/project_pluto_build_server.md
+++ b/.claude/memory/project_pluto_build_server.md
@@ -0,0 +1,18 @@
+---
+name: project-pluto-build-server
+description: "Pluto Windows build server — location, role, and access details"
+metadata: 
+  node_type: memory
+  type: project
+  originSessionId: 541d4004-8c45-4290-89f5-0ba9ee4e64a9
+---
+
+Pluto (`PLUTO`, 172.16.3.36) is a Windows Server 2019 VM hosted on Jupiter (Unraid primary).
+
+**Why:** It is the primary Windows build server for GuruRMM — builds all Windows agent variants (amd64, x86, legacy, debug), runs WiX 4 MSI builds, and signs binaries via Azure Trusted Signing.
+
+**Credentials:** Administrator / `Paper123!@#` (set 2026-05-15). SSH key: `guru@gururmm-build` (ed25519, `Q+ivqd/...`) must be in `C:\ProgramData\ssh\administrators_authorized_keys` with icacls `/inheritance:r` and ASCII encoding (not UTF-16).
+
+**How to apply:** When Pluto is unreachable or SSH auth fails, check Jupiter's VM console first (not physical machine). SSH key file must be ASCII-encoded — PowerShell `>` writes UTF-16 and breaks auth silently. Use `[System.IO.File]::WriteAllText(..., [System.Text.Encoding]::ASCII)` to write the key.
+
+**GuruRMM agent:** Installed but historically runs old versions (was on 0.6.3 as of 2026-05-15). Update it after any Pluto maintenance.
--- a/.claude/memory/project_rmm_webhook_docs_guard.md
+++ b/.claude/memory/project_rmm_webhook_docs_guard.md
@@ -0,0 +1,22 @@
+---
+name: project_rmm_webhook_docs_guard
+description: RMM build webhook now skips docs-only pushes (host guard in /opt/gururmm/webhook-handler.py). The repo copy is stale — don't redeploy it.
+metadata:
+  type: project
+---
+
+The GuruRMM build webhook (`gururmm-webhook.service` → `/opt/gururmm/webhook-handler.py`
+on 172.16.3.30) has a **docs-only build guard** as of 2026-05-30: a push whose every
+changed file matches `docs/`, `*.md`, `.claude/`, `session-logs/`, `LICENSE`, or
+`.gitignore` returns `Docs-only change -- build skipped` and triggers no build.
+Fail-safe toward building — no file list or any buildable file → build runs. Detection
+uses the Gitea push payload's per-commit `added`/`removed`/`modified` lists
+(`is_docs_only` / `NON_BUILDABLE`). Verified live (docs push skipped, no build locks,
+`last-built-commit` unchanged). Backup: `/opt/gururmm/webhook-handler.py.bak-20260530-guard`.
+
+This is **SPEC-020 Phase 0** (interim). The full fix migrates RMM CI to Gitea Actions
+with native `paths-ignore`, matching GuruConnect (ADR-002) — see [[reference_gitea_internal]].
+
+**Caveat:** the repo copy `scripts/webhook-handler.py` is STALE (109 lines vs the deployed
+206 — predates the split-build refactor) and does NOT contain the guard. Do not redeploy
+it over the host copy; the host is the source of truth until SPEC-020 lands.
--- a/.claude/memory/reference_dataforth_contact.md
+++ b/.claude/memory/reference_dataforth_contact.md
@@ -0,0 +1,7 @@
+---
+name: Dataforth Contact - AJ
+description: AJ at Dataforth - email forwarding setup needed for dataforthgit@ address
+type: reference
+---
+
+AJ at Dataforth needs messages sent to the dataforthgit@ email address to forward to him.
--- a/.claude/memory/reference_gururmm.md
+++ b/.claude/memory/reference_gururmm.md
@@ -1,6 +1,6 @@
 ---
 name: GuruRMM technical reference — server, API, user_session, pipeline, agent sandbox
-description: Operational reference for GuruRMM — server layout (SSH user, paths on 172.16.3.30), API auth + command execution + polling, user_session context (WTS impersonation, when SYSTEM fails), build-pipeline vendoring at deploy/build-pipeline/ (auto-sync to /opt/gururmm), Linux agent systemd sandbox trap (ProtectSystem=strict makes fs/mount observations sandbox-local).
+description: Operational reference for GuruRMM — server layout (SSH user, paths on 172.16.3.30), agent downloads dir + channel-tag rollout control, privileged server access via the server's OWN root RMM agent (no SSH needed) + plink fallback, API auth + command execution + polling, user_session context (WTS impersonation, when SYSTEM fails), build-pipeline vendoring at deploy/build-pipeline/ (auto-sync to /opt/gururmm), Linux agent systemd sandbox trap (ProtectSystem=strict makes fs/mount observations sandbox-local).
 type: reference
 ---

@@ -19,6 +19,18 @@ SSH user is **`guru`**, not `mike`. Home is `/home/guru/`. Other users with home

 ---

+## Privileged server access — downloads dir, channel tags, root agent (no SSH needed)
+
+**Agent downloads dir: `/var/www/gururmm/downloads`** (NOT the code default `/var/www/downloads`; set via `DOWNLOADS_DIR` env on the running `gururmm-server` process — read it live with `cat /proc/$(pgrep -f gururmm-server)/environ | tr '\0' '\n' | grep DOWNLOADS_DIR`). Holds the per-os/arch agent binaries (`gururmm-agent-{os}-{arch}-{version}[.exe]`), the base enrollment MSI, `latest` symlinks, `.sha256`, and **`.channel` sidecars**.
+
+**Channel-tag rollout control (this is how beta/stable is gated):** each binary has a `<binary>.channel` file containing `stable` or `beta`. `scanner.rs::get_latest_version`: **beta** agents get the absolute-latest binary regardless of tag; **stable** agents get only the latest `stable`-tagged binary (no sidecar = stable). So to soak a release beta-first: `echo beta > <binary>.channel` for the new version's binaries; to promote: `echo stable > ...`. The build pipeline's cleanup keeps only the current version, so once a new version is beta-tagged stable agents find NO newer stable binary and simply stay put. (Done 2026-06-01 to hold agent 0.6.51 / the Windows BSOD feature on beta — re-tagged the 4 `gururmm-agent-windows-*-0.6.51.exe.channel` files to beta. See [[feedback_gururmm_build_channel_default]].)
+
+**The server (172.16.3.30) runs its OWN GuruRMM Linux agent, AS ROOT** — hostname `gururmm` (resolve the UUID live via `GET /api/agents`; it was `5e5a7ebc-95ea-40c8-b965-6ec15d63e157` on 2026-06-01, but UUIDs change on re-enroll — never hardcode). This means **privileged commands on the server (read AND write the downloads dir, re-tag channels, inspect process environ, etc.) run through `/rmm` shell on that agent — no SSH required.** Contrary to the sandbox section below, real-path read/write to `/var/www/gururmm/downloads` works fine via this agent (verified by re-tagging channels 2026-06-01) — the `ProtectSystem` sandbox bites on mount *observations* and writes to paths missing from `ReadWritePaths`, not this dir. When unsure if a path is writable via the agent, just `touch` a tempfile and check.
+
+**SSH fallback from GURU-5070 (Windows):** `sshpass` is NOT installed here (the ix-server memory's sshpass note does not apply to GURU-5070). Use **`plink` / `pscp`** at `C:\Program Files\PuTTY\` with `-pw` and the vault creds (`guru@172.16.3.30`, password in `infrastructure/gururmm-server.sops.yaml` → `credentials.password`; sudo password = SSH password). Prefer the root-agent path above for one-off server ops.
+
+---
+
 ## API — execute a script on any agent

 **Base:** `http://172.16.3.30:3001` (reachable from HOWARD-HOME and similar dev machines via Tailscale).
--- a/.claude/memory/reference_gururmm_api.md
+++ b/.claude/memory/reference_gururmm_api.md
@@ -0,0 +1,92 @@
+---
+name: GuruRMM API — run PowerShell on any agent
+description: API endpoints, auth flow, and curl recipe to execute a script on any GuruRMM agent and retrieve output. Use this instead of asking user to paste script into ScreenConnect.
+type: reference
+---
+
+# GuruRMM API — Execute Script on an Agent
+
+**API base:** `http://172.16.3.30:3001` (reachable from HOWARD-HOME and similar dev machines via Tailscale — not reachable from cascades internal-network-only boxes, but that doesn't matter since the API talks to the agent, not the target machine).
+
+**Auth creds:** `infrastructure/gururmm-server.sops.yaml` → `credentials.gururmm-api.admin-email` + `admin-password`. Login returns a JWT valid for ~24h (expires 86400s from iat).
+
+## Flow
+
+```bash
+VAULT="$PWD/.claude/scripts/vault.sh"
+EMAIL=$(bash "$VAULT" get-field infrastructure/gururmm-server.sops.yaml credentials.gururmm-api.admin-email)
+PASS=$(bash  "$VAULT" get-field infrastructure/gururmm-server.sops.yaml credentials.gururmm-api.admin-password)
+
+JWT=$(curl -s -X POST http://172.16.3.30:3001/api/auth/login \
+  -H "Content-Type: application/json" \
+  -d "{\"email\":\"$EMAIL\",\"password\":\"$PASS\"}" \
+  | python -c "import json,sys; print(json.load(sys.stdin)['token'])")
+
+# List agents (find the agent_id for the host you want)
+curl -s http://172.16.3.30:3001/api/agents -H "Authorization: Bearer $JWT"
+
+# Submit a PowerShell command — works with any file, json-encode to preserve quotes/newlines
+AGENT="<agent-uuid>"
+PAYLOAD=$(python -c "
+import json
+with open('path/to/script.ps1','r',encoding='utf-8') as f: s=f.read()
+print(json.dumps({'command_type':'powershell','command':s}))
+")
+RESP=$(curl -s -X POST http://172.16.3.30:3001/api/agents/$AGENT/command \
+  -H "Authorization: Bearer $JWT" -H "Content-Type: application/json" -d "$PAYLOAD")
+CMD_ID=$(echo "$RESP" | python -c "import json,sys; print(json.load(sys.stdin)['command_id'])")
+
+# Poll until completed (status values: running, completed, failed, timeout)
+while true; do
+  STATUS=$(curl -s http://172.16.3.30:3001/api/commands/$CMD_ID -H "Authorization: Bearer $JWT" \
+    | python -c "import json,sys; print(json.load(sys.stdin)['status'])")
+  [ "$STATUS" != "running" ] && break
+  sleep 5
+done
+
+# Fetch result (stdout / stderr / exit_code)
+curl -s http://172.16.3.30:3001/api/commands/$CMD_ID -H "Authorization: Bearer $JWT"
+```
+
+## Required request fields
+
+`POST /api/agents/:id/command` requires:
+- `command_type` — the interpreter. Valid values include `powershell`, `shell`, `script`, `exec` — any string is accepted by the API but the Windows agent only runs powershell-compatible content. Use `powershell` for Windows agents.
+- `command` — the script text. JSON-encode to preserve newlines, quotes, and dollar-sign escapes.
+
+## Response shape (from `/api/commands/:cmd_id`)
+
+```json
+{
+  "id": "uuid",
+  "agent_id": "uuid",
+  "command_type": "powershell",
+  "command_text": "...",
+  "status": "completed",   // or running | failed | timeout
+  "exit_code": 0,
+  "stdout": "...",
+  "stderr": "...",
+  "created_at": "ISO-8601",
+  "started_at": "ISO-8601",
+  "completed_at": "ISO-8601"
+}
+```
+
+## When to use this
+
+- Readiness / diagnostic checks on any client server where GuruRMM is installed
+- One-off remediation without needing ScreenConnect copy-paste
+- Anywhere you'd otherwise ask the user to paste a script manually
+
+## When NOT to use this
+
+- When the agent isn't enrolled in GuruRMM (check `GET /api/agents` first)
+- For interactive sessions (no stdin; single-shot execution)
+- For >1 MB of script (untested — keep scripts modular)
+
+## Notes
+- Script output is limited; if you need large output, have the script write to a file on the agent and fetch via a separate command
+- `command_type: "powershell"` runs in the SYSTEM context on Windows (agent runs as LocalSystem)
+- Idempotent commands only — there is no transactional rollback
+- The tunnel API (`/api/v1/tunnel/...`) is a planned interactive feature per `.claude/gururmm-tunnel-plan.md`, not yet deployed as of 2026-04-22. Stick to `/api/agents/:id/command` for now.
+- Agents enrolled as of 2026-04-22 include CS-SERVER (`6766e973-e703-47c1-be56-76950290f87c`) for Cascades, DESKTOP-DLTAGOI for Cascades LE, AD2 for AZ Computer Guru. Use `GET /api/agents` for the live list.
--- a/.claude/memory/reference_gururmm_pipeline_vendored.md
+++ b/.claude/memory/reference_gururmm_pipeline_vendored.md
@@ -0,0 +1,29 @@
+---
+name: reference_gururmm_pipeline_vendored
+description: GuruRMM build-pipeline scripts are now version-controlled at deploy/build-pipeline/ in the gururmm repo (2026-06-01); build-shared.sh auto-syncs them to /opt/gururmm each build, so edit-in-repo + push = live — EXCEPT build-shared.sh + webhook-handler.py, which need a manual cp.
+metadata:
+  type: reference
+---
+
+The GuruRMM build/CI pipeline runs at **`/opt/gururmm/`** on the gururmm server (172.16.3.30,
+root-owned, hand-maintained). Those scripts had silently diverged from the repo's older `scripts/`
+generation (that drift caused the BUG-015 Windows build-gate gap). Reconciled 2026-06-01:
+
+- **Source of truth:** the live scripts are vendored into the gururmm repo at
+  **`deploy/build-pipeline/`** (build-{windows,linux,mac,agents,server,shared}.sh, sign-windows.sh,
+  webhook-handler.py + README). Commit `2bf539e`.
+- **Drift-stop (commit `24b5daf`):** `build-shared.sh` (runs first every build, after
+  `git reset --hard origin/main`) now `install -m 0755`-syncs the 6 build scripts from
+  `deploy/build-pipeline/` → `/opt/gururmm/` each build. So to change a GuruRMM build script:
+  **edit it in `deploy/build-pipeline/`, push to gururmm main — the next build runs it.** No manual
+  copy, no restart.
+- **Two exceptions — need a manual `sudo cp` on change** (they can't self-overwrite mid-run):
+  `build-shared.sh` (the running puller) and `webhook-handler.py` (the persistent HTTP server;
+  also needs `sudo systemctl restart gururmm-webhook` to reload). They change rarely. See
+  `deploy/build-pipeline/README.md`.
+
+Webhook still INVOKES the `/opt/gururmm` copies (not the repo copies directly) — the sync keeps
+them current. The repo's older `scripts/webhook-handler.py` + `scripts/build-agents.sh` are a prior
+generation, superseded. Build-windows.sh's change-gate watches `agent/ installer/` (BUG-015 fix —
+installer-only `.wxs`/`.ico` changes rebuild the MSI). Supersedes the "repo copy is stale, don't
+redeploy" caveat in [[project_rmm_webhook_docs_guard]] for the build scripts (not webhook-handler.py).
--- a/.claude/memory/reference_gururmm_server.md
+++ b/.claude/memory/reference_gururmm_server.md
@@ -0,0 +1,14 @@
+---
+name: GuruRMM Server Layout
+description: SSH user, home directory, and deploy paths on 172.16.3.30
+type: reference
+---
+
+SSH user is `guru`, NOT `mike`. Home directory is `/home/guru/`.
+
+- Repo: `/home/guru/gururmm`
+- Dashboard build: `cd /home/guru/gururmm/dashboard && npm run build`
+- Deploy: `sudo cp -r dist/* /var/www/gururmm/dashboard/`
+- Other dirs under `/home/guru/`: `guru-connect`, `guruconnect-server`, `backups`
+
+**Why:** First SSH session assumed `/home/mike/` — does not exist. Only users with home dirs are `guru` and `gitea-runner`.
--- a/.claude/memory/reference_gururmm_user_session_context.md
+++ b/.claude/memory/reference_gururmm_user_session_context.md
@@ -0,0 +1,19 @@
+---
+name: gururmm-user-session-context
+description: GuruRMM commands accept context=user_session (migration 041) to run as the active logged-on user via WTS impersonation — executes previously-interactive-only commands that fail as SYSTEM with "NonInteractive mode"
+metadata:
+  type: reference
+---
+
+GuruRMM's command API (`POST /api/agents/:id/command`, see [[reference_gururmm_api]]) accepts an optional **`context`** field:
+
+- `"system"` (default) — Session 0 / SYSTEM, the original behavior of every existing command.
+- `"user_session"` — runs in the **active logged-on user's** desktop session via WTS token impersonation (`WTSQueryUserToken` + `DuplicateTokenEx` + `CreateProcessAsUserW`, in `agent/src/watchdog/wts.rs`). **Requires an active logged-on user** on the endpoint — no user logged in = no session to run in.
+
+Added by migration `041_add_command_context.sql`; server enum `CommandContext` serializes `snake_case`.
+
+**Why it matters:** some Windows cmdlets fail as SYSTEM with a "NonInteractive mode" / interactive-session error and historically had to be done by hand on-site. `user_session` runs them remotely instead. Verified 2026-05-27 on the Peaceful Spirit **BridgetteHome** L2TP VPN deploy: `Set-VpnConnection -L2tpPsk -AllUserConnection` — previously documented as "cannot be done remotely" — was set successfully via `user_session`, completing a VPN rollout entirely through RMM with no on-site visit.
+
+**Elevation:** the WTS-impersonated token of a logged-on **admin** user comes back effectively elevated (`WindowsPrincipal.IsInRole(Administrator)=True`) — enough to write the all-user phonebook / HKLM. A **standard** logged-on user would NOT be elevated, so admin-requiring commands would still fail. The agent still launches `powershell.exe -NonInteractive`, so don't rely on real interactive prompts.
+
+**Invoke:** body `{"command_type":"powershell","command":"...","context":"user_session"}`. To dodge shell-quoting on multi-line scripts, base64-encode the script as UTF-16LE and send `powershell -NoProfile -NonInteractive -EncodedCommand <b64>` (`iconv` is absent in this Git Bash — encode with `py`).
--- a/.claude/memory/reference_ix_access_tailscale.md
+++ b/.claude/memory/reference_ix_access_tailscale.md
@@ -0,0 +1,7 @@
+---
+name: IX Server Access via Tailscale
+description: IX server (ix.azcomputerguru.com) is accessible with Tailscale on, no VPN needed
+type: reference
+---
+
+IX server (ix.azcomputerguru.com / 172.16.3.10) can be accessed directly when Tailscale is on. No separate VPN connection required.
--- a/.claude/memory/reference_ix_server_ssh.md
+++ b/.claude/memory/reference_ix_server_ssh.md
@@ -0,0 +1,20 @@
+---
+name: IX Server SSH Access
+description: SSH access notes for IX server - key auth not set up on GURU-5070 (was CachyOS), must use sshpass with password
+type: reference
+---
+
+[VERIFY 2026-05-26 — written under the old CachyOS install; GURU-5070 is now Windows 11. Re-confirm whether key auth is set up before relying on the no-key-auth/sshpass note.]
+
+## IX Server SSH from GURU-5070
+
+- **Host:** 172.16.3.10 (ix.azcomputerguru.com)
+- **User:** root
+- **Password:** See credentials.md
+- **SSH Key Auth:** NOT configured on GURU-5070 (formerly acg-guru-5070; now Windows 11)
+- **Must use:** `sshpass -p 'PASSWORD' ssh -o StrictHostKeyChecking=no -o PubkeyAuthentication=no root@172.16.3.10`
+- **Suppress warnings:** Pipe through `grep -v WARNING | grep -v 'not using'` or `tail`
+
+**Why:** The SSH key from this machine hasn't been added to IX server's authorized_keys yet. The old WSL key (guru@wsl) was authorized but this was a new install (originally CachyOS; GURU-5070 has since been reinstalled to Windows 11).
+
+**How to apply:** When running commands on IX server, use sshpass approach. Consider setting up SSH key auth to simplify future access.
--- a/.claude/memory/reference_rmm_agent_runs_in_systemd_sandbox.md
+++ b/.claude/memory/reference_rmm_agent_runs_in_systemd_sandbox.md
@@ -0,0 +1,36 @@
+---
+name: reference_rmm_agent_runs_in_systemd_sandbox
+description: Commands dispatched via the GuruRMM agent execute INSIDE the agent's systemd sandbox (ProtectSystem=strict) — fs/mount observations reflect the agent's private namespace, NOT the host. For host truth, SSH directly or read /proc/<host-pid>/mountinfo.
+metadata:
+  type: reference
+---
+
+The GuruRMM Linux agent runs as a systemd service (`gururmm-agent.service`) hardened with
+**`ProtectSystem=strict`**, which gives the agent process a **private mount namespace where `/`
+is mounted read-only**, with only `ReadWritePaths=` entries writable. **Any command you dispatch
+through the RMM agent (`/rmm shell`, probes) runs inside that namespace** — so `findmnt /`,
+`touch`, `/proc/mounts` etc. report the **agent's sandboxed view, not the host's actual state**.
+
+**Trap (hit 2026-06-01, GURU-KALI):** I diagnosed "host root filesystem is read-only" because
+RMM-dispatched `touch /var/lib/gururmm` returned EROFS (os error 30) and `findmnt /` showed `ro`.
+The host root was **rw the entire time** (SMART PASSED, ext4 clean, no kernel remount-ro — all
+consistent with the host being fine). The real cause: the unit's
+`ReadWritePaths=/var/log /usr/local/bin /etc/gururmm` **omitted `/var/lib/gururmm`**, so the agent
+couldn't persist `/var/lib/gururmm/.device-id` → it re-minted a device_id on each daily
+identity refresh → the server (no machine_uid dedup) filed a new agent row each time (~11 ghosts).
+
+**How to get host truth instead of the sandbox view:**
+- SSH to the host directly (commands there run in the host namespace), OR
+- Read the agent PID's namespace explicitly: `cat /proc/<agent_pid>/mountinfo` — the process-scoped
+  `ro` on `/` is the tell that it's sandbox, not host. Compare against the host's `findmnt`.
+- `errors=remount-ro` in a mount line is just the stock default mount option — NOT evidence an
+  error fired. Confirm an actual remount-ro with kernel `EXT4-fs error` logs + `dumpe2fs -h` error
+  count, not the mount option alone.
+
+**The fix pattern** (durable, additive): drop-in
+`/etc/systemd/system/gururmm-agent.service.d/override.conf` with `[Service]\nReadWritePaths=/var/lib/gururmm`
+(systemd merges ReadWritePaths additively across drop-ins), then `daemon-reload` + `restart`.
+Better upstream fix: `StateDirectory=gururmm` (handles dir creation + perms + RW bind in one
+directive). **Fleet implication:** every systemd-installed GuruRMM Linux agent with this unit shape
+has the same latent bug until the installer is fixed. See filed todos (agent ReadWritePaths/
+StateDirectory + server machine_uid dedup).