sync: auto-sync from HOWARD-HOME at 2026-06-26 11:40:19

Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-06-26 11:40:19
This commit is contained in:
2026-06-26 11:40:52 -07:00
parent 0cf843e13d
commit 5bace24371
8 changed files with 473 additions and 0 deletions

View File

@@ -31,6 +31,17 @@ production, data-loss. Detail: EXTENDED + `.claude/OLLAMA.md`.
## Key rules (always)
- **NO EMOJIS.** Use ASCII markers: `[OK]` `[ERROR]` `[WARNING]` `[INFO]` `[CRITICAL]`.
- **Skill-first — if a skill/command covers the task, USE IT; never hand-roll the API.**
When a request maps to an installed skill or slash-command, INVOKE THAT SKILL instead of
improvising raw `curl`/API calls from memory. The skill encodes the correct payload shape,
validation, attribution, and preview gates; free-handing the API is exactly how malformed
records (e.g. Syncro tickets/invoices) reach a human for cleanup. **Syncro billing/invoicing
ALWAYS runs through `/syncro` (or `/syncro-emergency-billing`) — no exceptions.** Same for
other covered domains: credentials → `vault`, RMM actions → `/rmm` (+ `rmm-search` to find a
host), M365 → `remediation-tool`, etc. Knowing the API is NOT a reason to bypass the skill —
the memory rules (e.g. [[feedback_syncro_billing]]) describe what the SKILL does, not a license
to free-hand it. Reach for raw API ONLY when no skill fits or the skill genuinely cannot do it
— and say so explicitly when you do. Mistakes here go to `errorlog.md` (`--correction`).
- **Credentials — capture, vault, document (ALWAYS).** ANY credential that surfaces in a
session — one the user pastes, one you create/rotate, one you discover in a log/config — you
MUST immediately store it in the SOPS vault **via the `vault` skill** (the canonical path —

View File

@@ -110,6 +110,7 @@
- [Don't present inferred topology as fact](feedback_no_inferred_topology_as_fact.md) — Private-IP overlap (172.16.x on both sides) is NOT proof of a site-to-site link; I fabricated a VWP<->office VPN. State observations vs inferences; a failed reachability test disproves a link, don't explain it away; test "can reach RMM" against the EXTERNAL endpoint, not internal 172.16.3.30.
### Syncro
- [Skill-first routing — use the skill, never hand-roll the API](feedback_skill_first_routing.md) — If an installed skill covers the task, INVOKE IT. Syncro billing/invoicing ALWAYS runs through `/syncro` (or `/syncro-emergency-billing`), never ad-hoc curl — free-handing payloads is what makes Winter fix malformed tickets. Now a CORE rule. Generalizes to vault/rmm/remediation-tool/etc.
- [Syncro API plumbing](feedback_syncro_api.md) — Content-Type required on all POST/PUT; NO idempotency anywhere — always GET before retrying; response wrappers (`.ticket.id`, `.comment.id`); add_line_item shape (internal ID, flat response, required fields); HTML uses `<br>` not `<ul>/<li>`; timer_entry response is FLAT but SUPERSEDED (use add_line_item).
- [Syncro billing rules](feedback_syncro_billing.md) — Bill with `add_line_item` directly (not timers); fetch rates LIVE; never invent labor names (real product names only); match labor type to delivery channel (never "Prepaid project labor"); labor `taxable:false` (AZ); warranty `1049360` (never patch price); emergency `26184` ×1.5 once, branch by `prepay_hours`; corrections preserve original tech's user_id; estimate hardware `32252`.
- [Syncro workflow rules](feedback_syncro_workflow.md) — ALWAYS preview comments before posting (no exceptions); verify appointment day-of-week ("Saturday 2026-05-23") before creating; ASK who the appointment owner is; leave `contact_id` BLANK by default for ALL customers (ignore Syncro's contact-picker auto-default).

View File

@@ -0,0 +1,18 @@
---
name: feedback_skill_first_routing
description: If an installed skill/command covers a request, INVOKE THE SKILL — never hand-roll the API from memory. Syncro billing/invoicing ALWAYS goes through /syncro (or /syncro-emergency-billing). Knowing the API is not a license to bypass the skill.
metadata:
type: feedback
---
When a request maps to an installed skill or slash-command, **invoke that skill** rather than improvising raw `curl`/API calls. This is a hard rule, now in CORE `CLAUDE.md` ("Skill-first"). The canonical offender is **Syncro billing**: every invoice/line-item/ticket-billing request goes through the `/syncro` skill (or `/syncro-emergency-billing` for after-hours) — NOT ad-hoc API calls.
**Why:** I default to "act directly" and, because I "know" the Syncro REST API, I reach for a hand-rolled `add_line_item` curl from memory. Free-handing the payload gets the structure wrong (attribution/`?api_key=` owner, `taxable:false`, line-item shape, priority/type format, blank contact, the preview gate), producing malformed tickets — and **Winter has to fix them** (already flagged on #32193/#32194 and others). The skill encodes all of that correctly and enforces the preview/confirm gate. The detailed billing rules in [[feedback_syncro_billing]] describe what the SKILL does when it bills; they are NOT a license to bypass the skill and do it by hand.
**How to apply:**
- Billing/invoicing/ticketing/scheduling in Syncro -> `/syncro` (after-hours/emergency -> `/syncro-emergency-billing`). No exceptions, even for a "quick" one-line charge.
- More generally: before reaching for raw API, ask "is there a skill for this?" Credentials -> `vault`; RMM actions -> `/rmm` (find the host with `rmm-search`); M365 investigation/remediation -> `remediation-tool`; the per-vendor skills (bitdefender, datto-edr, packetdial, b2, mailprotector, screenconnect...) own their APIs.
- Use raw API ONLY when no skill fits, or the skill genuinely cannot do the thing — and SAY SO explicitly when you do, so the user can sanity-check.
- When the user corrects a bypass, log it: `bash .claude/scripts/log-skill-error.sh "<skill>" "hand-rolled API instead of using the skill" --correction`.
Related: [[feedback_psa_default_syncro]] (Syncro is the default PSA), [[feedback_syncro_preview_mandatory]] (preview gate the skill enforces), [[feedback_syncro_priority_type_format]] (a malformed-ticket case Winter flagged), [[feedback_syncro_billing]], [[feedback_syncro_workflow]].

View File

@@ -7,6 +7,8 @@ metadata:
Rules only. Incident detail, verbatim Mike quotes, ticket numbers, dates, the tech user_id table, and the labor-product table all live in [[feedback_syncro_history]] — read on-demand when judging an edge case. API mechanics: [[feedback_syncro_api]]. Workflow: [[feedback_syncro_workflow]].
**Run billing THROUGH the `/syncro` skill — do not hand-roll these calls.** The rules below describe what the skill does when it bills; they are the spec, not a license to free-hand `curl` from memory (that produces malformed tickets Winter has to fix). Invoke `/syncro` (after-hours -> `/syncro-emergency-billing`). See [[feedback_skill_first_routing]]. "`add_line_item` directly" (§1) means "not the timer workflow" — NOT "bypass the skill."
`.claude/commands/syncro.md` is the authoritative live product table.
---

View File

@@ -0,0 +1,136 @@
## User
- **User:** Howard Enos (howard)
- **Machine:** Howard-Home
- **Role:** tech
## Session Summary
Resumed the Cascades of Tucson Bitdefender-removal / Datto EDR-install straggler workstream
the morning after the 2026-06-25 EDR rollout. Goal (per the scheduled 9am one-shot cron
`9288b586`): make sure Bitdefender is OFF and Datto EDR is INSTALLED on the 7 machines that
were offline at the end of the prior session. The session opened just after a context clear;
reloaded context from the wiki article, the cascades CONTEXT.md, and the prior EDR rollout
session log, and confirmed no unread coord messages.
A live RMM status pull at 08:54 MST showed 5 of the 7 targets had reconnected in the 9am
arrival wave (last_seen within seconds): DESKTOP-MD6UQI3, DESKTOP-TRCIEJA, Laptop4, NurseAssist,
SALES4-PC. Two stayed offline: DESKTOP-F94M8UT (last seen 06-23) and DESKTOP-KQSL232 (05-29).
Howard directed a measured approach to avoid colliding with a second concurrent session doing
Windows Home->Pro key upgrades on some of the same machines: run the read-only BD-checks on the
3 non-overlapping boxes immediately, and hold NurseAssist + SALES4-PC.
Ran the standard BD-check (services `^EP(Security|Protected|Update|Redline|Integration)Service$`
+ uninstall-registry DisplayName ~ Bitdefender/GravityZone + `Test-Path 'C:\Program Files\Bitdefender'`)
via RMM PowerShell on the 3 safe boxes. Results: DESKTOP-MD6UQI3 = NO_BITDEFENDER (clean),
Laptop4 = NO_BITDEFENDER (clean, and it responded this time after being unresponsive twice prior),
DESKTOP-TRCIEJA = BD_ACTIVE (EPSecurity/EPUpdate/EPProtected/EPIntegration services + "Bitdefender
Endpoint Security Tools" + folder all present). TRCIEJA needs the GravityZone console "Uninstall
client" task — the Public API has no uninstall method in this version (verified DEAD in the skill)
and local uninstall is blocked by anti-tampering with no uninstall password on policy "GPS Default".
As Howard cleared each held item it was processed: SALES4-PC (being repurposed) BD-checked clean
(NO_BITDEFENDER, already has EDR); NurseAssist (confirmed a distinct machine from ASSISTNURSE-PC,
not a duplicate) had its Pro upgrade completed by the other session, after which the queued
BD-aware EDR install (`d1806aa3`) was confirmed already fired ("Installed RTS agent", exit 0) and
enrolled in the Cascades Datto EDR org (agent `23c3c36e`, AV on, v3.17.1.5552). The two offline
machines were resolved by Howard's knowledge: DESKTOP-F94M8UT is Alma Montt's machine (offboarded
6/25), just powered off — Howard will power it on; DESKTOP-KQSL232 is Lois Lane's old machine and
was removed from the list (decommissioned).
## Key Decisions
- Honored the live "safe 3 now, hold 2" instruction over the cron's blanket "process all online"
— the cron prompt is generic; the operator's real-time call accounts for the parallel
key-upgrade session and reboot-collision risk on NurseAssist/SALES4-PC.
- Did not attempt to run the TRCIEJA uninstall via API or local RMM. The GravityZone Public API
uninstall method is DEAD in this version, and BEST anti-tampering with no uninstall password
blocks an endpoint-side uninstall — so it is genuinely console-only (same wall as RECEPTIONIST-PC
last session). Handed Howard exact console click-path instead of forcing a broken path.
- For NurseAssist, verified the already-queued install actually fired AND enrolled in the EDR org
rather than trusting the queue or blindly re-dispatching (the Pro upgrade rebooted it; RMM agent
id happened to stay the same, but verified live).
- Removed DESKTOP-KQSL232 from the straggler list on Howard's identification (Lois Lane's old,
decommissioned machine) rather than continuing to chase it.
## Problems Encountered
- cwd drift recurred: after an earlier `cd` into `.claude/skills/bitdefender/scripts`, a later
relative `.claude/scripts/rmm-auth.sh` failed ("No such file or directory") because the Bash
tool's working directory persists across calls. Fixed by `cd /c/claudetools` first. Logged to
errorlog as `--friction` (ref the 2026-06-25 edr-rollout cwd-drift note) — this is a documented
recurring friction.
- `edr.py agents --json` produced no output on this subcommand (broken/empty JSON path); the plain
table output worked. Resolved by grepping the table for the hostname instead of jq over `--json`.
- `gz.py endpoints --company <root>` returned total=0 because endpoints are nested in child groups,
not the company root node. Did not pursue recursive group enumeration — the endpoint ID is not
needed for the console task (operator selects the machine by hostname).
- Laptop4's BD-check hung in `running` past the quick polls (it had been unresponsive twice before);
used a background `until` poll (no foreground sleep) to wait for the terminal state — it returned
completed/clean.
## Configuration Changes
- No repo source files changed by the work itself (operational against GuruRMM + Datto EDR +
GravityZone).
- Appended `errorlog.md` — one `--friction` entry (bash/env cwd-drift).
- Created this session log.
- Endpoint changes (Cascades fleet): NurseAssist Datto EDR agent install confirmed (queued cmd
fired). No BD removals performed this session (TRCIEJA left for Howard's console task).
## Credentials & Secrets
- No new secrets. Datto EDR Cascades registration key `6qw68y2rwl` (target group
`1dbd2b02-f7df-45d0-a7f2-18667f48447f`) referenced for any re-dispatch. GravityZone uses
vault `msp-tools/gravityzone.sops.yaml` (API key only; no console login stored — console
uninstall needs a human-held GravityZone login). RMM creds vault
`infrastructure/gururmm-server.sops.yaml`.
## Infrastructure & Servers
- GuruRMM API: http://172.16.3.30:3001 (auth via `.claude/scripts/rmm-auth.sh`).
- Datto EDR (Infocyte): https://azcomp4587.infocyte.com ; Cascades org
`2d5ea96e-3228-461b-9c60-13ae464b61d8` ; target group `1dbd2b02-f7df-45d0-a7f2-18667f48447f`.
- Bitdefender GravityZone: cloud.gravityzone.bitdefender.com ; Cascades company
`66b0448e1e0441d02508bad8` ; policy "GPS Default" (antiTampering on, NO uninstall password).
- Live RMM agent IDs (resolve live; UUIDs change on re-enroll): DESKTOP-MD6UQI3
`99d7c8a7-8efb-4416-b9df-da22e0797aa0`, DESKTOP-TRCIEJA `c9bf1a2d-bfdc-401e-9cc8-f9e90bb19587`,
Laptop4 `7a23fa6c-559e-4015-be36-235676f2e025`, SALES4-PC `975f70d8-cd6d-45d7-9da1-6ce2f1ae59ab`,
NurseAssist `fc88f14b-06eb-47ac-b9e6-971c44d700ba` (unchanged after Pro upgrade),
DESKTOP-F94M8UT `675311a1-ad0b-4fdb-8711-4b6cc7da7350`.
- NurseAssist Datto EDR agent id: `23c3c36e-219a-4c4a-a30c-5dc121e3151e` (online, AV on, v3.17.1.5552).
## Commands & Outputs
- Live target status: `rmm-search.sh -c cascades --json` filtered to the 7 hostnames; trust
`last_seen` (the `status` field came back blank) — last_seen within ~1 min == online.
- BD-check dispatch: `POST $RMM/api/agents/<id>/command` (powershell, 60s) with the verdict script;
poll `GET $RMM/api/commands/<cid>`. Verdicts: MD6UQI3/Laptop4/SALES4-PC = `NO_BITDEFENDER`;
TRCIEJA = `BD_ACTIVE` (SERVICES=EPIntegrationService,EPProtectedService,EPSecurityService,EPUpdateService;
APPS="Bitdefender Endpoint Security Tools"; FOLDER=True).
- NurseAssist queued install verify: `GET $RMM/api/commands/d1806aa3-9711-4f54-9a41-fd334efd8fd3`
-> status=completed exit=0, "Installed RTS agent to C:\Program Files\infocyte\agent\agent.exe".
- EDR enrollment verify: `edr.py agents --org 2d5ea96e-...` | grep nurseassist -> online yes, AV on,
v3.17.1.5552 (use the table output; `--json` is broken on this subcommand).
- `[RMM]` alerts posted to #dev-alerts after each dispatch (3-box BD-check, SALES4-PC, NurseAssist close-out).
## Pending / Incomplete Tasks
- DESKTOP-TRCIEJA: Howard runs GravityZone console "Uninstall client" (Network -> Cascades company ->
DESKTOP-TRCIEJA -> Tasks -> Uninstall client). API/local paths are blocked. Optional re-verify BD-check
after. TRCIEJA is slated for replacement anyway.
- DESKTOP-F94M8UT: Alma Montt's machine, powered off; Howard will power it on. BD-aware EDR install is
queued (cmd `a4623704`) and should auto-fire on reconnect — verify enrollment in EDR org when up.
Offer stands to arm a background watcher for its power-on.
- Broader EDR-migration cleanup from the 6/25 session still open: remove Cascades from Syncro's
Bitdefender deployment (GUI-only) so BD does not redeploy; GravityZone portal cleanup of stale
RECEPTIONIST-PC endpoint records; inverse gap `laptop3` (EDR agent, no RMM agent); stale EDR agents
laptop1 / cascades-laptop; CS-SERVER prior-MSP CentraStage leftover.
## Reference Information
- Datto EDR skill: `.claude/skills/datto-edr/scripts/edr.py`. GravityZone skill:
`.claude/skills/bitdefender/scripts/gz.py` (createUninstallTask + variants DEAD this API version).
- Prior session: `clients/cascades-tucson/session-logs/2026-06/2026-06-25-howard-edr-rollout-bitdefender-removal.md`.
- RMM run reference: `.claude/commands/rmm.md`. Bot alerts: `.claude/scripts/post-bot-alert.sh` (#dev-alerts).
- 9am one-shot cron `9288b586` (session-only, auto-deletes after firing); keep-awake guard
`bl9idsqip` completed (held machine awake to 9:05).

View File

@@ -0,0 +1,116 @@
## User
- **User:** Howard Enos (howard)
- **Machine:** Howard-Home
- **Role:** tech
## Session Summary
Continued the Cascades of Tucson Windows Home -> Pro upgrade workstream from 2026-06-25. Started
with a fleet-aware audit: pulled the GuruRMM agent inventory and identified non-Pro (Windows Home)
machines. Confirmed RMM's `os_name` field is trustworthy for edition (yesterday's three upgrades now
report Pro/Pro for Workstations correctly, so it reads true edition, not the stale "Home" ProductName
string). The remaining Cascades Home boxes were NurseAssist, SALES4-PC, DESKTOP-MD6UQI3, and
CascadesProxess; NURSESTATION-PC (Business), ANN-PC (Enterprise), and CS-SERVER (Server 2019) are
domain-join-capable and were not flagged.
Per Howard's direction, upgraded NurseAssist and DESKTOP-MD6UQI3 (excluding CascadesProxess as an
appliance and SALES4-PC pending a who-pays decision). New requirement this session: send the
logged-on user a reboot warning 2 minutes before the restart. Implemented as `msg * /time:120 "..."`
plus `shutdown /r /t 120 /f /c "..."` (the shutdown countdown also shows a save-your-work message).
Both machines followed the proven flow: silent `changepk.exe /productkey <generic Pro key>` edition
flip (Core -> Professional, no auto-reboot as SYSTEM) -> warned reboot to reconcile registry+licensing
-> `slmgr /ipk <MAK>` + `/ato` -> `/dli` verify. Both activated as Pro for Workstations (VOLUME_MAK,
Licensed). NurseAssist had no active user; DESKTOP-MD6UQI3 had user "Dining Manager" active, so the
2-minute warning mattered there. NurseAssist needed an `/ato` retry; DESKTOP-MD6UQI3 activated first try
(the 0xC004E028 on first /ato is benign "activation already in progress").
Confirmed NurseAssist is a distinct physical machine from Assistnurse-pc (one is Home, the other Pro for
Workstations -- a single box cannot be both), resolving the earlier duplicate concern.
Ran a fleet-wide Home-edition audit at Howard's request: 26 Home installs across many clients, most
likely intentional (non-domain small clients). Flagged AMT-PC (Arizona Medical Transit) on Windows 7
Home Premium as an EOL security risk separate from any Pro/Home upgrade. Howard scoped the rest of the
work back to Cascades only.
Howard reported SALES4-PC was upgraded by the machine's supplier (an outside party that supplied it to
Cascades), so it needs no action and no charge from us. He then asked to upgrade CascadesProxess; the
pre-check confirmed it is the Proxess access-control server (ProxessIQ Server service running, console
user "Unwired", EditionID=Core). Before dispatching changepk, Howard interrupted: CascadesProxess must
be done late at night because it is still actively in use for the access-control hardware install. No
disruptive action was taken on it -- only the read-only pre-check. CascadesProxess upgrade deferred.
## Key Decisions
- Send a 2-minute reboot warning (msg + shutdown /t 120) to the logged-on user before each reboot
(Howard's new requirement; mid-morning upgrades mean users are active).
- changepk runs silently as SYSTEM and does not disrupt the user, so it is dispatched first; the
user-facing warning + reboot is the second step.
- Excluded CascadesProxess (appliance) and SALES4-PC (supplier-owned, who-pays pending) per Howard.
- Deferred CascadesProxess to a late-night window because it is in active use for the access-control
hardware install; will verify ProxessIQ Server restarts after the reboot.
- Treated NurseAssist as a real distinct machine (different edition than Assistnurse-pc proves they are
separate boxes) -> safe to upgrade.
- For non-Cascades Home machines: informational only; do not upgrade without a reason (domain join).
## Problems Encountered
- Repeated the doubled-single-quote PowerShell registry-path bug AGAIN (day after logging it; cited
feedback_windows_quote_stripping). Inside a single-quoted bash $SCRIPT, ''HKLM:\...\Windows NT\...''
collapses and the space in "Windows NT" breaks the read. Logged a second --friction entry noting the
rule is not sticking; the durable fix is to always use double-quotes for paths inside the bash script
(or build PS via heredoc), never doubled single-quotes.
- DESKTOP-MD6UQI3 was unresponsive to even trivial commands at session start (just came online, likely
thrashing) -- commands sat "running" for 80+ seconds. Deferred it, proceeded with NurseAssist, then it
recovered on its own a few minutes later and upgraded cleanly.
- Both NurseAssist and DESKTOP-MD6UQI3 briefly dropped offline right after changepk completed, so the
warning+reboot command queued as "pending" and fired automatically on reconnect -- still correct, just
required waiting for the reconnect before verifying.
- NurseAssist /ato first attempt left License Status: Notification; retry activated successfully (same
transient pattern as MEMRECEPT yesterday).
## Configuration Changes
- Appended: `errorlog.md` -- second bash/PowerShell quoting --friction entry (repeat).
- No repo doc edits this session (REMAINING-WORK-PLAN.md / billing memory updates from 2026-06-25 still
current; SALES4-PC supplier-upgrade and CascadesProxess deferral noted here -- update the plan/memory
next session when CascadesProxess is done).
## Credentials & Secrets
- ACG Windows Pro (for Workstations) MAK: vault `infrastructure/windows-pro-mak.sops.yaml`, field
`credentials.product_key`. Consumed 2 more counts this session (NurseAssist, DESKTOP-MD6UQI3). No new
secrets. Generic public Pro key (not secret): VK7JG-NPHTM-C97JM-9MPGT-3V66T.
## Infrastructure & Servers
- GuruRMM: http://172.16.3.30:3001. Cascades agent IDs used -- NurseAssist
fc88f14b-06eb-47ac-b9e6-971c44d700ba, DESKTOP-MD6UQI3 99d7c8a7-8efb-4416-b9df-da22e0797aa0,
CascadesProxess f41f02a8-6e80-4f8c-a427-0436decc3147 (deferred).
- CascadesProxess: Proxess access-control server; service "ProxessIQ Server"; console user "Unwired";
reboot briefly interrupts door-access management -- do late at night.
## Commands & Outputs
- Edition flip (as SYSTEM, no auto-reboot): `changepk.exe /productkey VK7JG-NPHTM-C97JM-9MPGT-3V66T`
-> Core -> Professional, exit 0.
- 2-min warning + reboot: `msg * /time:120 "..."` then `shutdown /r /t 120 /f /c "..."`.
- Activate: `slmgr /ipk <MAK>` -> `/ato` (retry if License Status stays Notification / 0x8004FE92) -> `/dli`.
- Result both boxes: "Windows(R), ProfessionalWorkstation edition ... VOLUME_MAK ... License Status: Licensed".
- Trust `EditionID` (and RMM os_name) over `ProductName` for edition.
## Pending / Incomplete Tasks
- CascadesProxess Home -> Pro upgrade: deferred to a late-night window (in use for access-control install).
Flow ready; verify ProxessIQ Server restarts post-reboot. Will be +1 MAK count / $99 unless it
self-activates via digital entitlement.
- Billing: NurseAssist + DESKTOP-MD6UQI3 = 2 x $99 = $198 to Cascades (not yet invoiced this session).
Reuse Syncro product "Windows Pro Upgrade" (id 23571919); machine-named lines; optional labor.
- SALES4-PC: upgraded by supplier -- no ACG action, no charge.
- Then domain-join the now-Pro Cascades boxes per REMAINING-WORK-PLAN.md.
## Reference Information
- Prior session log: clients/cascades-tucson/session-logs/2026-06/2026-06-25-howard-home-to-pro-upgrades.md
- Yesterday's billing: Syncro ticket #32466, invoice #1650806091; product "Windows Pro Upgrade" id 23571919.
- Generic Pro key: VK7JG-NPHTM-C97JM-9MPGT-3V66T (edition flip only, does not activate).
- Cascades non-Pro remaining: CascadesProxess (deferred). SALES4-PC done by supplier.

View File

@@ -0,0 +1,187 @@
## User
- **User:** Howard Enos (howard)
- **Machine:** Howard-Home
- **Role:** tech
## Session Summary
Re-opened the deferred GuruRMM macOS-agent enrollment failure on Nick Pafford's Apple
Silicon Mac at Rednour Law Offices (open item from the 2026-06-25 session). Howard is
heading onsite and wanted the problem diagnosed and a fix pre-staged before arrival. The
device was offline (present in ScreenConnect but not connected), so no live commands could
be run against it — all diagnosis was done from the repo and the RMM server endpoints.
Disproved the prior working hypothesis. The 2026-06-25 log, the wiki, and coord todo
6f2d22be all recorded the cause as "served aarch64 binary is unsigned, so Apple Silicon
SIGKILLs it (killed: 9)." Pulled the actual served binary from
`https://rmm.azcomputerguru.com/install/GREEN-FALCON-7214/download/macos` (3,960,397 bytes,
single-arch arm64) and parsed its Mach-O load commands directly: it carries an
`LC_CODE_SIGNATURE` and a valid CodeDirectory with the adhoc flag set (linker-inserted
ad-hoc signature, identifier `gururmm_agent-51a9f25b57c13649`). An ad-hoc-signed arm64
binary satisfies Apple Silicon's AMFI and runs — so the unsigned/SIGKILL theory is wrong.
All six linked dylibs are stock system frameworks (Security, CoreFoundation, libobjc,
IOKit, libiconv, libSystem), ruling out a dyld "library not loaded" failure too. Howard
recalled the actual error was a "file not found," which is consistent with this.
Found the real root cause by reading the agent and server source. The server's enrollment
endpoint (`server/src/api/enroll.rs`) types `EnrollRequest.site_id` as `uuid::Uuid` — it
requires a UUID. The server-served macOS install script
(`https://rmm.azcomputerguru.com/install/GREEN-FALCON-7214/macos`) writes the site **code**
string `GREEN-FALCON-7214` into `/usr/local/etc/gururmm/site.plist` as `site_id`. The agent
reads that and POSTs `site_id: "GREEN-FALCON-7214"` to `/api/enroll`, which fails UUID
deserialization (HTTP 422) — enrollment retries forever and the agent never connects. The
`.pkg` postinstall writes a real UUID into the same plist, so the two macOS installers
disagree and the curl-script path is the broken one. The "file not found" Howard saw is the
secondary symptom: `config.rs::default_config_path()` has no macOS branch, so a manual
`gururmm-agent run` with no readable plist falls back to the Linux path
`/etc/gururmm/config.toml`, which does not exist on macOS -> "Failed to read config file".
Resolved the correct site UUID for Rednour's "Main" site via the RMM API
(`c7f5787c-8e71-45b3-841f-fa52436f7d26`; client "Rednour Law Offices"
`85f7cff4-d4db-48a8-b477-b8788122a361`, active). Note this differs from the `d008c7d4-...`
UUID hardcoded in the repo's `.pkg` postinstall, which belongs to a different/test site and
must not be used for Rednour. Built a self-contained onsite Terminal paste-block that
installs the agent, writes `site.plist` with the UUID (not the code), writes the
LaunchDaemon, reloads, and verifies. Delivered it to Howard's Discord DMs (split to fit the
2000-char limit). Per Howard's instruction, did NOT update the wiki, coord todo, or message
Mike with the proposed fix — all of that waits until the fix is verified onsite.
## Key Decisions
- **Verified the binary signature from the repo/server instead of trusting the recorded
hypothesis.** Parsing the Mach-O directly (load commands + CodeDirectory flags) overturned
the "unsigned" theory without needing the offline Mac.
- **Use the site UUID `c7f5787c-...` from the live RMM API, not the `.pkg`'s hardcoded
`d008c7d4-...`.** The pkg UUID is for a different site; using it would enroll Nick's Mac
into the wrong site (or fail).
- **Onsite workaround = run the official installer, then overwrite the one bad value
(code -> UUID) in `site.plist` and reload.** Uses the server's known-good plumbing for
everything except the buggy field. The Discord block writes the plist with the UUID from
the start so it is idempotent even on a clean machine.
- **Hold all wiki/coord/Mike updates until verified onsite** (Howard's explicit call). The
records still say "unsigned binary / codesign fix"; correcting them before proof would
risk publishing a second unverified theory.
- **Did not test enrollment from the workstation.** POSTing to `/api/enroll` would create a
junk `enrolled_agents` row for a fake hostname on production; the source already proves the
UUID requirement, so a live test was unnecessary and avoided.
## Problems Encountered
- **Prior root-cause hypothesis was wrong and was propagated to three places** (session log,
wiki known-issues, coord todo 6f2d22be). Disproved via direct Mach-O inspection. Records
intentionally left uncorrected pending onsite verification.
- **First download URL guess 404'd.** `.../downloads/gururmm-agent-darwin-arm64` does not
exist; the real route is `/install/:site_code/download/macos` (read from
`server/src/main.rs`).
- **Device offline in ScreenConnect** — no live diagnostics possible; all work done from
source + server endpoints, and the fix staged for Howard to run by hand onsite.
- **Discord 2000-char limit** — the paste-block message was 2,091 chars and rejected (code
50035). Split the prose header off the code block (1,933 chars) and sent separately.
## Configuration Changes
No changes to client machines, the RMM server, or the repo were made this session (pure
diagnosis + delivery). The agent on Nick's Mac was NOT modified — it was offline.
Pending (to be applied onsite by Howard): write
`/usr/local/etc/gururmm/site.plist` with `site_id =
c7f5787c-8e71-45b3-841f-fa52436f7d26`, plus the LaunchDaemon, via the delivered paste-block.
## Credentials & Secrets
None created or discovered this session. The site enrollment code `GREEN-FALCON-7214` and
site UUID `c7f5787c-8e71-45b3-841f-fa52436f7d26` are enrollment identifiers (not secrets);
the agent obtains a per-agent key (`agk_...`) from the server only after a successful enroll.
## Infrastructure & Servers
- **Nick's Mac (target):** ScreenConnect machine name `DUXs-Mac-Studio`; Apple `Mac13,1`
Mac Studio, Apple M1 Max (arm64); macOS 26.5.1; serial F6QR2PN2R6. Confirm this is
actually Nick's box before enrolling (name suggests a "Dux" user).
- **GuruRMM server:** `http://172.16.3.30:3001` (internal API used for site lookup);
public install/download host `https://rmm.azcomputerguru.com`; agent enroll API
`https://rmm-api.azcomputerguru.com/api/enroll`.
- **Rednour "Main" site:** UUID `c7f5787c-8e71-45b3-841f-fa52436f7d26`, code
`GREEN-FALCON-7214`, active. Client "Rednour Law Offices"
`85f7cff4-d4db-48a8-b477-b8788122a361`.
- **macOS agent install paths:** binary `/usr/local/bin/gururmm-agent`; plist
`/usr/local/etc/gururmm/site.plist`; LaunchDaemon
`/Library/LaunchDaemons/com.azcomputerguru.gururmm-agent.plist` (label
`com.azcomputerguru.gururmm-agent`, runs `gururmm-agent run`); log
`/usr/local/var/log/gururmm-agent.log`.
## Commands & Outputs
```bash
# Served arm64 binary IS signed (overturns "unsigned" theory)
curl -fsSL https://rmm.azcomputerguru.com/install/GREEN-FALCON-7214/download/macos -o agent
# Mach-O parse: magic feedfacf (64-bit arm64), LC_CODE_SIGNATURE present,
# CodeDirectory flags 0x20002 (adhoc set), identifier gururmm_agent-51a9f25b57c13649,
# no CMS blob -> linker ad-hoc signature (runs on Apple Silicon).
# Linked dylibs: Security, CoreFoundation, libobjc.A, IOKit, libiconv.2, libSystem.B (all stock).
# Root cause in source:
# server/src/api/enroll.rs:29 pub site_id: uuid::Uuid <- requires a UUID
# install script writes <key>site_id</key><string>GREEN-FALCON-7214</string> <- a CODE
# => POST /api/enroll 422s, agent retries forever.
# agent/src/config.rs default_config_path() has no macOS branch -> /etc/gururmm/config.toml
# fallback => "Failed to read config file" (the observed "file not found").
# Resolve correct Rednour Main site UUID
eval "$(bash .claude/scripts/rmm-auth.sh)" # RMM=http://172.16.3.30:3001
curl -fsS -H "Authorization: Bearer $TOKEN" "$RMM/api/sites/c7f5787c-8e71-45b3-841f-fa52436f7d26"
# site: Main | client_id: 85f7cff4-... (Rednour Law Offices) | active: True
```
Onsite fix (delivered to Howard's Discord) — run in Terminal on the Mac:
```bash
sudo bash -c '
UUID="c7f5787c-8e71-45b3-841f-fa52436f7d26"
BIN="/usr/local/bin/gururmm-agent"; CFG="/usr/local/etc/gururmm/site.plist"
PLIST="/Library/LaunchDaemons/com.azcomputerguru.gururmm-agent.plist"
URL="https://rmm.azcomputerguru.com/install/GREEN-FALCON-7214/download/macos"
mkdir -p /usr/local/bin /usr/local/etc/gururmm /usr/local/var/log
curl -fsSL "$URL" -o "$BIN" && chmod +x "$BIN"
printf "%s\n" "<?xml version=\"1.0\"?><plist version=\"1.0\"><dict><key>site_id</key><string>$UUID</string></dict></plist>" > "$CFG"; chmod 600 "$CFG"
# (LaunchDaemon written per macos-install.sh), then:
launchctl bootout system "$PLIST" 2>/dev/null || true
launchctl bootstrap system "$PLIST"; launchctl enable system/com.azcomputerguru.gururmm-agent
sleep 6; tail -30 /usr/local/var/log/gururmm-agent.log; plutil -p "$CFG"
'
# (full canonical block, with the real plist XML + LaunchDaemon, is in Howard's Discord DM)
```
## Pending / Incomplete Tasks
- **Verify the fix onsite** (Howard, next): run the delivered paste-block on Nick's Mac.
Success = log shows "Enrollment complete - agent key persisted" + WS connect; `plutil -p`
shows an `agent_key`; agent appears in dashboard under Rednour Law Offices -> Main. Watch
for `Killed: 9` (would mean signing theory is wrong) or `Site not found`/422 (UUID typo).
- **After verification — correct the records** (currently HELD): update wiki
`clients/rednour.md` (remove the unsigned/codesign known-issue, replace with the
code-vs-UUID enroll bug), close/annotate coord todo 6f2d22be, and file the bug for Mike.
- **The actual code fix (for Mike):** either the macOS install script should stamp the site
UUID into `site.plist` (like the `.pkg` postinstall), or `/api/enroll` should accept a site
code (the legacy `register_legacy` handler already resolves via `get_site_by_code`).
Secondary: add a macOS branch to `default_config_path()` and finish the stubbed macOS
`install` verb (`main.rs` returns "macOS launchd service installation is not yet
implemented"). Until patched, every macOS enrollment needs the manual code->UUID step.
- **Confirm `DUXs-Mac-Studio` is Nick's machine** before enrolling.
- Carryover from 2026-06-25 (unchanged): Syncro #32343 billing (0.5h onsite), auto-reconnect
of the Documents mount in Nick's Login Items, return visit for phone + printer.
## Reference Information
- Coord todo: `6f2d22be-e653-48c8-9f9b-0155420b315d` (project gururmm) — still open, records
the now-disproven hypothesis; do not close until onsite verification.
- Served macOS binary: `https://rmm.azcomputerguru.com/install/GREEN-FALCON-7214/download/macos`
(3,960,397 bytes, arm64, ad-hoc signed, id `gururmm_agent-51a9f25b57c13649`).
- macOS install script: `https://rmm.azcomputerguru.com/install/GREEN-FALCON-7214/macos`.
- Source files: `projects/msp-tools/guru-rmm/server/src/api/enroll.rs` (site_id: Uuid),
`agent/src/config.rs` (default_config_path, no macOS branch),
`agent/src/macos_storage.rs` (reads `/usr/local/etc/gururmm/site.plist`),
`agent/src/main.rs` (resolve_windows_config macOS branch; macOS install stub),
`agent/scripts/install.sh`, `agent/pkg-scripts/postinstall` (hardcoded `d008c7d4-...`).
- Prior session: `clients/rednour/session-logs/2026-06/2026-06-25-howard-nick-smb-share-and-mac-rmm.md`.
- Wiki: `wiki/clients/rednour.md` (macOS-unsigned known issue — to be corrected post-verify).
- RMM agent (REDNOURCARRIEVI): `8e4e2221-7e2a-4a6f-9eda-864568539961`.

View File

@@ -17,6 +17,8 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure ·
<!-- Append entries below this line -->
2026-06-26 | Howard-Home | syncro/billing | [correction] hand-rolled add_line_item API calls from memory instead of using the /syncro skill; malformed tickets reached Winter for cleanup. Correct: route ALL Syncro billing/invoicing through the skill. Generalized to a CORE skill-first rule. [ctx: rule=skill-first memory=feedback_skill_first_routing tickets=#32193,#32194]
2026-06-26 | Howard-Home | bash/env | [friction] used relative .claude/scripts/rmm-auth.sh after an earlier cd into a skill scripts dir (cwd persists across Bash calls) -> 'No such file or directory'; fix: cd /c/claudetools first or use absolute paths [ctx: ref=2026-06-25-edr-rollout cwd-drift note]
2026-06-26 | Howard-Home | rmm/bash-quoting | [friction] REPEAT (same session, day after logging it): used doubled single-quotes ('') around a PowerShell registry path inside a single-quoted bash $SCRIPT again -> 'Windows NT' path space broke the read. Fix is known (double-quotes inside). Rule from feedback_windows_quote_stripping not sticking under flow - consider always building PS scripts via a heredoc to a var, never inline single-quoted with embedded quotes. [ctx: ref=feedback_windows_quote_stripping repeat=2]