diff --git a/wiki/clients/gonzvar-tax-services.md b/wiki/clients/gonzvar-tax-services.md new file mode 100644 index 0000000..5cbb7f0 --- /dev/null +++ b/wiki/clients/gonzvar-tax-services.md @@ -0,0 +1,127 @@ +--- +type: client +name: gonzvar-tax-services +display_name: Gonzvar Tax Services +last_compiled: 2026-06-12 +compiled_by: GURU-5070/claude-main +sources: + - clients/gonzvar-tax-services/session-logs/2026-06-06-mike-rmm-onboarding-diagnostic-bug-discovery.md + - clients/gonzvar-tax-services/TASKS.md + - clients/gonzvar-tax-services/DIAGNOSTIC-SUMMARY-2026-06-06.md + - clients/gonzvar-tax-services/GTS-W0-DISK-ANALYSIS.md + - clients/gonzvar-tax-services/onboarding-baselines/GTS-W0-20260606T180736.md + - clients/gonzvar-tax-services/onboarding-baselines/GTS-W1-20260606T180908.md + - clients/gonzvar-tax-services/onboarding-baselines/GTS-W2-20260606T181016.md + - clients/gonzvar-tax-services/onboarding-baselines/GTS-PEDRO-H-20260606T181113.md + - clients/gonzvar-tax-services/onboarding-baselines/GTS-SVR25-20260606T181205.md + - clients/gonzvar-tax-services/onboarding-baselines/SERVER-20260606T181304.md + - session-logs/2026-06-07-mike-gururmm-backup-alert-cleanup.md +backlinks: + - projects/msp-tools/guru-rmm +--- + +# Gonzvar Tax Services + +Tax services firm onboarded as new MSP client in June 2026. Six machines enrolled in GuruRMM across a Windows AD environment (GTS.local). Fleet-wide onboarding diagnostics completed at intake; multiple security findings remain open. Active setup tasks pending (QuickBooks RemoteApp, Tailscale VPN, security hardening). + +## Profile + +- **Contract type:** (verify — not found in Syncro; search returned 0 matches on "gonzvar tax services") +- **Key contacts:** (verify — pgonz / GTS\gonzvar account names inferred from baselines; no Syncro contact record found) +- **Billing rate:** (verify — check Syncro invoices) +- **Hours remaining (if prepaid):** (verify) +- **Managed device count:** 6 (all enrolled in GuruRMM as of 2026-06-06) +- **Syncro customer ID:** none (customer not found in Syncro as of 2026-06-12) + +## Infrastructure + +### Servers & Services + +| Host | IP | Role | OS | Notes | +|---|---|---|---|---| +| GTS-SVR25 | 192.168.0.2 (static) | Primary DC, DNS | Windows Server 2025 Standard (build 26100) | ASUS; i7-12700, 32 GB; Defender RTP off at baseline; firewall enabled | +| SERVER | 192.168.0.5 (static) | Legacy server | Windows Server 2019 Standard (build 17763) | Dell PowerEdge T440; Xeon Bronze 3204, 8 GB; SMBv1 enabled; firewall off; 104-day uptime at baseline | + +### Workstations + +| Host | IP | OS | Notes | +|---|---|---|---| +| GTS-W0 | 192.168.0.145 (DHCP) | Win11 Pro for Workstations (build 26200) | Lenovo 90SM006QUS; i5-12400, 16 GB; firewall off, RDP without NLA; ZeroTier 10.244.136.41 | +| GTS-W1 | 192.168.0.143 (DHCP) | Win11 Pro for Workstations (build 26200) | Lenovo 90SM006QUS; i5-12400, 16 GB; domain-joined | +| GTS-W2 | 192.168.0.146 (DHCP) | Win11 Pro for Workstations (build 26200) | Lenovo 90SM006QUS; i5-12400, 16 GB; domain-joined | +| GTS-PEDRO-H | 192.168.0.146 (DHCP, WiFi) | Win11 (build 26200) | Lenovo 90SM006QUS; i5-12400, 16 GB; NOT domain-joined (WORKGROUP); personal machine; WiFi only; ZeroTier 10.244.10.231 | + +Note: GTS-W2 and GTS-PEDRO-H both resolved to 192.168.0.146 at scan time — probable DHCP address overlap worth checking. + +### Email & Identity + +- **M365 tenant:** (verify) +- **MX / mail flow:** (verify) +- **MFA status:** (verify) +- **Domain:** GTS.local (AD); GTS-SVR25 is primary DC and NTP source; workstations W0/W1/W2 and SERVER domain-joined; GTS-PEDRO-H in WORKGROUP +- **LAPS:** Present on GTS-W0, GTS-W1, GTS-W2, GTS-SVR25; not detected on SERVER + +### Network + +- **ISP / WAN:** Cox Communications (inferred from PEDRO-H DNS: 68.105.28.11, 68.105.29.11, 68.105.28.12) +- **Subnet:** 192.168.0.0/24 (DHCP served by GTS-SVR25) +- **Firewall:** (verify — no perimeter device observed in logs) +- **VPN:** ZeroTier present on GTS-W0 and GTS-PEDRO-H; Tailscale planned but not yet deployed +- **DNS:** GTS-SVR25 (192.168.0.2) primary, SERVER (192.168.0.5) secondary (domain-joined machines) + +## Access + +- **RMM (GuruRMM):** Site code `INNER-BEAR-6727`; enrollment key and site IDs in vault (`clients/gonzvar-tax-services/gururmm-site-main.sops.yaml`); install page: `https://rmm.azcomputerguru.com/install/INNER-BEAR-6727` +- **ScreenConnect:** All machines enrolled; client ID `1912bf3444b41a08`, version 26.1.24.9579 +- **Splashtop:** All machines; Streamer 3.8.x running +- **Syncro agent:** All machines; version 1.0.201.18410 +- **Datto RMM:** Present on GTS-SVR25 (4.4.11616) as additional ACG tooling +- **Admin accounts:** `pgonz` (local admin on all workstations); `GTS\gonzvar` (domain admin); `sysadmin` (local admin on servers); `GTS\pedro` (domain admin, seen on GTS-W0); `MediaAdmin$` (managed service account on servers) +- **Vault path:** `clients/gonzvar-tax-services/` + +## Patterns & Known Issues + +**Fleet-wide security configuration gaps (baseline 2026-06-06):** +- Firewalls disabled (all profiles: Domain, Private, Public) on GTS-W0 and SERVER; GTS-SVR25 has all profiles enabled; W1/W2/PEDRO-H status requires re-run after probe fix. +- RDP without NLA on GTS-W0 (pre-auth vulnerability). GTS-SVR25 and SERVER have RDP enabled with NLA — confirm restricted to VPN/internal IPs. +- No backup agent detected on any machine at baseline. SERVER had an abandoned Nov-2024 MSP360 image plan (needs deletion from MSP360 console). +- Defender RTP and antimalware service both off on GTS-SVR25. No AV agent detected (Server SKU — Security Center does not register; verify a managed AV is active or re-enable Defender). +- BitLocker inconsistent: GTS-W1 encrypted (TPM + recovery key); GTS-W0 unencrypted; servers returned null (verify with `manage-bde -status`). +- Group Policy Client service stopped on GTS-W0 (and possibly other machines). Investigate Group Policy application. + +**SERVER legacy risk:** +- Windows Server 2019 (build 17763) with SMBv1 enabled and 5 pending updates at baseline; 104-day uptime. Server 2019 extended support ends 2029-01-09 — plan upgrade path to Server 2025. +- SMBv1 must be disabled: `Set-SmbServerConfiguration -EnableSMB1Protocol $false`. + +**Diagnostic probe false positives (GuruRMM onboarding-diagnostic.ps1):** +- Event ID 153 from `Microsoft-Windows-Kernel-Boot` (VBS enabled boot message) is counted the same as Event ID 153 from the `Disk` source (real I/O error). On Windows 11 machines with VBS/HVCI enabled (default on 12th-gen Intel+), every boot logs an Event ID 153 that falsely inflates the disk-error count. +- GTS-W0 initially showed 9 "disk errors" — all were VBS boot messages; drive (Kingston NVMe 1TB) confirmed healthy via SMART. +- GTS-SVR25 showed 83 "disk errors" at baseline — almost certainly the same false positive given 20+ days uptime and similar Win11 base. +- Probe fix required (filter Event ID 153 by `ProviderName != 'Microsoft-Windows-Kernel-Boot'` or query `ProviderName = 'disk'` directly). Re-run baselines after fix to get accurate grades. + +**GTS-PEDRO-H is not domain-joined:** +- Personal machine; WORKGROUP only; WiFi connectivity; only `pgonz` is local admin. Treat as bring-your-own device — lower management priority but still enrolled in RMM. + +## Active Work + +*Syncro not available for this client as of 2026-06-12. Open tasks from coord API (project key: gonzvar):* + +| Task | Status | Notes | +|---|---|---| +| QuickBooks RemoteApp setup | Pending | Install QB on server; configure RemoteApp for local + VPN users | +| System cleanup (all machines) | Pending | Disk cleanup, temp files, updates, clear reboots | +| RDP over VPN (Tailscale) | Pending | Install Tailscale on server + workstations; addresses RDP exposure | +| GuruRMM enrollment | Complete | All 6 machines enrolled 2026-06-06 (was deferred, found pre-enrolled) | +| Security hardening (fleet) | Open | Firewall enable, RDP NLA, BitLocker, Defender RTP on SVR25, SMBv1 disable | + +## History Highlights + +- **2026-06-06** — New MSP client created; GuruRMM client `ae78d033` + site "Main" (`INNER-BEAR-6727`) provisioned; enrollment key vaulted. +- **2026-06-06** — Discovered 6 machines already enrolled in RMM (expected 4; found 3 workstations + 1 personal + 2 servers). +- **2026-06-06** — Fleet-wide onboarding diagnostic baseline run: GTS-W0, GTS-SVR25, SERVER graded RED; GTS-W1, GTS-W2, GTS-PEDRO-H graded AMBER. +- **2026-06-06** — Critical GuruRMM probe bug discovered: Event ID 153 / Kernel-Boot (VBS) counted as disk errors on Win11 machines; GTS-W0 initial "failing drive" alert retracted; drive confirmed healthy. +- **2026-06-07** — SERVER (Gonzvar) flagged during backup alert review; abandoned Nov-2024 MSP360 image plan identified for deletion. + +## Backlinks + +- [GuruRMM](../projects/msp-tools/guru-rmm.md) — onboarding diagnostic probe; Event ID 153 false-positive bug fix required diff --git a/wiki/clients/tohono-oodham-doit.md b/wiki/clients/tohono-oodham-doit.md new file mode 100644 index 0000000..d51fa01 --- /dev/null +++ b/wiki/clients/tohono-oodham-doit.md @@ -0,0 +1,85 @@ +--- +type: client +name: tohono-oodham-doit +display_name: Tohono O'odham Nation - Department of Information & Technology (DoIT) +last_compiled: 2026-06-12 +compiled_by: GURU-5070/claude-main +sources: + - clients/tohono-oodham-doit/session-logs/2026-05-27-session.md + - syncro:33069069 +backlinks: + - clients/sif-oidak +--- + +# Tohono O'odham Nation - Department of Information & Technology (DoIT) + +## Profile +- **Contract type:** Break-fix with recurring Starlink service reseller billing (monthly internet + per-incident labor) +- **Key contacts:** + - Shannon Ramon — shannon.ramon@tonation-nsn.gov, 520-471-3072 (primary) + - Brandon Capeheart (I&T System Administrator) — brandon.capeheart@tonation-nsn.gov, 520-993-5779 + - Marcus Ramon (I&T Network Manager) — marcus.ramon@tonation-nsn.gov, 520-240-0844 + - Trina Rodriguez (DoIT) — trina.rodriguez@tonation-nsn.gov, 520-383-0270 / 520-349-4297 + - Yvonne Enriguez (DoIT Office Manager) — Yvonne.Enriquez@tonation-nsn.gov, 520-383-0270 + - Tianna Aguilla (Accounts Payable) — tianna.aguilla@tonation-nsn.gov, (520) 648-4130 ext 4108 + - Denise Darrell (Accounts Payable, Dept of IT) — denise.darrell@tonnation-nsn.gov, 520-383-6600 +- **Billing rate:** $175/hr (onsite labor) +- **Hours remaining (if prepaid):** N/A — no prepaid block +- **Active ticket:** Syncro #32328 (Waiting on Customer) +- **Syncro customer ID:** 33069069 +- **Address:** 25310 South Toltec Buttes Road, Eloy, AZ 85131; mailing: PO Box 837, Sells, AZ 85634; DoIT Annex: 307 Vamori Street, Tucson, AZ 85756 + +## Infrastructure + +### Servers & Services +| Host | IP | Role | OS | Notes | +|---|---|---|---|---| + +No Syncro-managed assets on record. No RMM agents deployed as of 2026-06-12. + +### Email & Identity +- **M365 tenant:** (verify) +- **MX / mail flow:** (verify) — staff use @tonation-nsn.gov addresses +- **MFA status:** (verify) + +### Network +- **ISP / WAN (field sites x2):** Starlink Roam Unlimited (mobile); configured in bypass mode — Check Point 1550 WAN interface holds the ISP-assigned IP directly. Starlink Roam issues CGNAT 100.64.x.x addresses, so each field site has no public routable WAN IP. +- **ISP / WAN (main office):** Non-Starlink; public static IP(s). ISP and gateway hardware unconfirmed. +- **Firewall (field):** Check Point 1550 (Gaia Embedded) — 2 units, one per field site +- **Firewall (main office):** (verify — make/model unconfirmed; assumed Check Point based on field fleet) +- **VPN:** Pending design decision; two options under evaluation: + - **Option A — Native IPsec hub-and-spoke:** Field 1550s initiate outbound IPsec to office public IP using existing hardware; no overlay required. Cleanest path if main office gateway is also Check Point. + - **Option B — Tailscale overlay:** Subnet-router node deployed behind the office firewall; small Tailscale-capable node (GL.iNet Beryl AX, Flint 2, pfSense, or OPNsense) at each field site. Traverses CGNAT via NAT-traversal and DERP relay on port 443. + +## Access +- No remote access credentials or vault paths on file for this client. +- Vault path: (verify — create at `clients/tohono-oodham-doit/` if credentials are issued) +- Syncro: https://computerguru.syncromsp.com/customers/33069069 + +## Patterns & Known Issues + +- **CGNAT field WAN:** All field sites are behind Starlink Roam Unlimited in bypass mode. Bypass mode removes Starlink's own NAT but Starlink Roam still assigns a CGNAT 100.64.x.x address to the 1550 WAN port — not a public IP. Any site-to-site VPN or remote management initiated from the field must be outbound-only; the main office hub must be the reachable endpoint. On-site verification: each field 1550's WAN IP should show 100.64.x.x. If a real public IP appears, a Starlink public-IP add-on may be active, which changes the VPN calculus. +- **Check Point 1550 (Gaia Embedded) is a closed appliance:** Third-party overlay software (Tailscale, ZeroTier) is not supported and cannot be installed on the 1550 itself. An Option B Tailscale deployment requires a separate device alongside the 1550 at each field site. +- **Multiple Tohono O'odham accounts in Syncro:** DoIT (33069069), Legislative Branch (35323240), Farming Authority (33405788), Sif-oidak District (7694718) are separate Syncro customer records for the same tribal nation. Confirm account before opening tickets. +- **Starlink reseller billing:** ComputerGuru bills DoIT for recurring Starlink internet service (~$397-421/month for 2 lines). Labor is billed break-fix at $175/hr as separate line items. + +## Active Work + +*As of 2026-06-12 — Syncro shows 1 open ticket:* + +| Ticket | Subject | Status | Opened | +|---|---|---|---| +| #32328 (ID: 111209848) | Request for Starlink Static IP options | Waiting on Customer | 2026-05-27 | + +Ticket #32328: Presented two site-to-site VPN design options (native Check Point IPsec hub-and-spoke vs. Tailscale overlay) for CGNAT field-to-office connectivity. Recommended skipping a Starlink static IP upgrade — the reachable main office hub makes it unnecessary for either option. Awaiting DoIT internal IT decision on VPN entrypoint and main office gateway make/model confirmation. + +## History Highlights + +- **2025-01:** Onsite Starlink installation (invoice #64532, 1 hr labor, $175) +- **2025-11-18:** Onsite event Starlink rental and setup for November event in Sells, AZ (invoice #66431, $362.50 — rental + 1hr setup + 0.5hr trip fee) +- **2025-11-25:** Sold and installed 2x Starlink Mini Mobile Roam kits (receiver, car adapter, roof mount) at field sites; monthly Starlink service billing initiated (invoice #66494, $915.94 hardware; recurring ~$397-421/month since) +- **2026-05-27:** VPN design consultation for CGNAT field-to-office connectivity — researched Starlink static IP availability (not available on Roam) and CGNAT traversal options; created Ticket #32328, posted customer-visible two-option recommendation; ticket set to Waiting on Customer + +## Backlinks + +- [Sif-oidak District - Tohono O'odham Nation](sif-oidak.md) — related Syncro account for the same tribal nation (Sif-oidak District, ID 7694718) diff --git a/wiki/clients/tucson-golden-corral.md b/wiki/clients/tucson-golden-corral.md new file mode 100644 index 0000000..3b47e47 --- /dev/null +++ b/wiki/clients/tucson-golden-corral.md @@ -0,0 +1,111 @@ +--- +type: client +name: tucson-golden-corral +display_name: Tucson Golden Corral +last_compiled: 2026-06-12 +compiled_by: GURU-5070/claude-main +sources: + - clients/tucson-golden-corral/session-logs/2026-05-26-session.md + - session-logs/2026-05-25-session.md + - session-logs/2026-04-30-session.md + - .claude/memory/reference_resource_map.md +backlinks: + - systems/neptune + - systems/ix-server + - projects/gururmm +--- + +# Tucson Golden Corral + +Restaurant / food-service business in Tucson, AZ. Managed by ACG with a prepaid hour block +contract. Primary contact is Jeffrey Schaufel (owner). Email is on IX cPanel hosting. +TGC-SERVER is a single-box DC + RDS + Hyper-V running Windows Server 2016 with several +unresolved architecture concerns flagged at onboarding. + +## Profile + +- **Contract type:** Prepaid hour block +- **Key contacts:** + - Jeffrey Schaufel (owner) — office 520-574-9167 + - Al Young — 520-571-0972 / mobile 520-338-1004 + - Josie Schaffel — 520-971-3991 +- **Service address:** 4380 E 22nd St, Tucson, AZ 85711 +- **Billing rate:** (verify — check Syncro invoices) +- **Hours remaining (if prepaid):** 12.75 hrs as of 2026-06-12 +- **Syncro customer ID:** 3859123 +- **Managed device count (Syncro assets):** 3 + +## Infrastructure + +### Servers & Services + +| Host | IP | Role | OS | Notes | +|---|---|---|---|---| +| TGC-SERVER | 98.181.90.163 (public) | DC / DNS / RDS / Hyper-V / SQL / IIS | Windows Server 2016 (build 14393) | Extended support ends Jan 2027; GuruRMM agent 1275daa1; ScreenConnect installed; admin account actively browsing (Chrome) | + +**Hyper-V VMs on TGC-SERVER:** + +| VM | State | Notes | +|---|---|---| +| MAS90 | Running | Sage 100 ERP — customer-critical workload | +| MAS90.old | Off | Prior snapshot / backup copy | + +**Syncro workstation assets:** + +| Device | Type | +|---|---| +| Desktop Dell DHM | Desktop | +| Lenovo ThinkCenter 001LUS | Desktop | +| Lenovo Ideapad 3305-15KB 81FS | Laptop | + +**GuruRMM:** +- Client ID: 3248bdec-cbc3-45df-ba63-c8cdc9395e58 +- Site: Co-Located (ID: e5caa88f-f395-40e3-befa-f54e035f4293, code: INNER-STORM-2733) +- Agent (TGC-SERVER): 1275daa1-3996-4ecf-a1db-c82e88f757b4 + +### Email & Identity + +- **Email platform:** IX cPanel hosting — cPanel account `tucsongc`, domain `tucsongoldencorral.com` +- **Neptune Exchange note:** In April 2026, a webmail password reset for `accounting@tucsongoldencorral.com` was attempted on Neptune Exchange (67.206.163.124). Relationship between Neptune-hosted accounts and IX-hosted accounts is (verify — determine if any mailboxes remain on Neptune Exchange or if all are on IX). +- **M365:** "Office 365 annual" recurring invoice ($108.69/yr) exists in Syncro. Per May 2026 session context, primary email is on IX, not M365. Verify current M365 scope (licensing only vs. active mailboxes). +- **MFA status:** (verify) + +### Network + +- **ISP / WAN:** (verify) +- **Firewall:** (verify — TGC-SERVER is on public IP 98.181.90.163 with no firewall recorded) +- **VPN:** (verify) + +## Access + +- **GuruRMM dashboard:** https://rmm.azcomputerguru.com — client filter: Tucson Golden Corral +- **GuruRMM IEX installer:** `irm 'https://rmm.azcomputerguru.com/install/INNER-STORM-2733/windows' | iex` +- **IX cPanel (email / hosting):** https://72.194.62.5:2083 — account `tucsongc`; credentials via vault: `infrastructure/ix-server.sops.yaml` +- **IX WHM API:** https://72.194.62.5:2087 (used for email account management) +- **Vault — GuruRMM enrollment key:** `clients/tucson-golden-corral/gururmm-site-co-located.sops.yaml` +- **RDP to TGC-SERVER:** (verify — no RDP path recorded; use GuruRMM agent 1275daa1 or ScreenConnect) + +## Patterns & Known Issues + +- **TGC-SERVER is doing too much.** Single Windows Server 2016 machine running DC, DNS, full RDS stack, Hyper-V (with a production ERP VM), SQL Server, and IIS. Customer confirmed Hyper-V was not expected on this box. Architecture needs remediation. +- **MAS90 (Sage 100 ERP) in Hyper-V on the DC.** Running as a VM on the same box as Active Directory. No dedicated Hyper-V host. Migration options (dedicated HV host, or P2V to bare-metal Sage) not yet decided — requires customer input on hardware availability and MAS90 usage. +- **Administrator account browsing from the DC.** Process list at onboarding showed Chrome running as Administrator on TGC-SERVER (a domain controller). Security risk; should be flagged to customer for remediation (dedicated admin workstation or jump server). +- **Windows Server 2016 EOL approaching.** Extended support ends January 2027. OS upgrade planning should be in the queue. +- **Email account churn via Discord.** Terminations/additions are requested by Jeffrey Schaufel via the Discord bot, not a formal ticket. Work is straightforward (IX cPanel UAPI) but tickets should continue to be created in Syncro for audit trail. +- **No backup recorded.** No backup product or destination observed for TGC-SERVER or workstations. (Verify — may be absent or unreported.) + +## Active Work + +*No open tickets in Syncro as of 2026-06-12. See session logs for recent work.* + +## History Highlights + +- **2026-04-30** — Webmail password reset requested for `accounting@tucsongoldencorral.com`; attempted via Neptune Exchange ECP, resolved via Active Directory on DC16. (Source: session-logs/2026-04-30-session.md) +- **2026-05-25** — Client onboarded into GuruRMM; TGC-SERVER enrolled (agent 1275daa1, Windows Server 2016, 16 GB RAM, 1.8 TB disk); full Windows role inventory confirmed AD DS, DNS, full RDS stack, Hyper-V, SQL Server, IIS + Certify the Web. Hyper-V flagged as unexpected by customer; MAS90 (Sage 100 ERP) VM found running. Chrome-on-DC and WS2016 EOL noted. +- **2026-05-26** — Email account `Erick.Godoy@tucsongoldencorral.com` deleted via IX cPanel UAPI on employee termination request from Jeffrey Schaufel. Billed 0.25 hrs prepaid; Syncro ticket #32327, invoice ID 1650421921. + +## Backlinks + +- [[systems/neptune]] — Neptune Exchange (67.206.163.124, Exchange 2016); accounting@ reset attempt April 2026 +- [[systems/ix-server]] — IX cPanel server hosts tucsongoldencorral.com email (account `tucsongc`) +- [[projects/gururmm]] — GuruRMM client enrollment; TGC-SERVER monitored via agent 1275daa1 diff --git a/wiki/index.md b/wiki/index.md index e10d233..489bddd 100644 --- a/wiki/index.md +++ b/wiki/index.md @@ -54,6 +54,9 @@ Run `/wiki-lint` to check for stale entries and broken backlinks. | [Starr Pass Realty](clients/starr-pass.md) | Real estate; Syncro 153298; flat-rate ~$92.93/mo; starrpass.com M365 tenant (222450dd) onboarded 2026-06-10; sole M365 user sysadmin@starrpass.com (Brian Shinn); DNS on ACG IX; legacy Neptune mailbox cansley@devconllc.com; 2 Syncro assets | 2026-06-10 | | [Universal Minerals International](clients/universal-minerals.md) | Minerals/commodities, Tucson AZ; Syncro 34844920; **break-fix, no prepaid/RMM**; CyndyOffice (HP Pavilion TP01, Win11 Home, QuickBooks Enterprise 22.0) intermittent hard-freeze (Kernel-Power 41, no dump = hardware/firmware) — BIOS F.38 + Fast Startup off + memtest passed 2026-06-10, PSU prime remaining suspect; QB messaging crash-loop repaired; ticket #32397 monitoring; temporary diagnostic RMM agent removed same-day | 2026-06-10 | | [Putt Land Surveying](clients/putt-land-surveying.md) | Land surveying firm; Syncro 7180175; managed services $223.92/mo; 7 devices; M365 direct (8 mailboxes, cloud-only, 2x Basic + 5x Premium); **DNS wipe 2026-06-09** — all records deleted (MX, SPF, autodiscover, A), email+website down; GoDaddy domain in client's own account (no ACG control); ticket #32404 Waiting on Customer; remediation tools onboarded 2026-06-10 | 2026-06-10 | +| [Gonzvar Tax Services](clients/gonzvar-tax-services.md) | Tax services firm; 6 machines in GuruRMM (GTS.local AD, 2 servers + 4 workstations); open security findings from 2026-06-06 onboarding baseline; Syncro not found (billing fields verify); QuickBooks RemoteApp + Tailscale VPN pending | 2026-06-12 | +| [Tohono O'odham Nation DoIT](clients/tohono-oodham-doit.md) | Tribal government IT dept; Syncro 33069069; Starlink reseller client — 2x Check Point 1550 field sites on Starlink Roam (CGNAT); break-fix $175/hr; VPN design (IPsec vs Tailscale) pending | 2026-05-27 | +| [Tucson Golden Corral](clients/tucson-golden-corral.md) | Restaurant (Tucson AZ); Syncro 3859123; prepaid block 12.75 hrs; IX cPanel email; WS2016 single-box DC/RDS/Hyper-V/SQL with Sage 100 ERP; architecture concerns outstanding | 2026-05-26 | ## Projects @@ -66,6 +69,8 @@ Run `/wiki-lint` to check for stale entries and broken backlinks. | [MSP Pricing & Marketing](projects/msp-pricing.md) | GPS pricing docs + Python calculators + MSP Buyers Guide HTML; covers GPS monitoring, support plans, block time, web/email hosting, VoIP; customer-facing tools pending | 2026-05-24 | | [Wrightstown Smarthome](projects/wrightstown-smarthome.md) | Home automation project (HA Yellow + Ollama + LiteLLM + Wyoming voice stack); 4-VLAN design; planning phase only as of 2026-02-09; no hardware deployed | 2026-05-24 | | [Wrightstown Solar](projects/wrightstown-solar.md) | Off-grid solar project (EVE C40 16S5P packs, Victron MultiPlus II, JK BMS); Phase 1 budget $2,175–2,945; planning phase only as of 2026-02-09; no hardware purchased | 2026-05-24 | +| [GuruRMM Agent](projects/gururmm-agent.md) | Cross-platform endpoint agent (Rust) for the GuruRMM platform — metrics, remote execution (system/user_session contexts), BSOD detection, VSS shadow copy, self-update w/ rollback, watchdog, compliance reporting; Windows/Linux/macOS. Companion to [GuruRMM](projects/gururmm.md) | 2026-06-12 | +| [MSP Tools (umbrella)](projects/msp-tools.md) | Umbrella directory for ACG's MSP tooling — hosts the GuruRMM + GuruConnect submodules plus GuruScan, audit scripts, quote-wizard, and operational utilities | 2026-06-12 | ## Systems @@ -129,7 +134,6 @@ Run `/wiki-lint` to check for stale entries and broken backlinks. | Scope | Priority | Notes | |---|---|---| -| `system:neptune` | Low | neptune.acghosting.com, 172.16.3.11 internal / 67.206.163.124 external — Exchange Server 2016; ACG infrastructure physically colocated at Dataforth D2 facility; active mail server for multiple ACG-hosted clients; Neptune context captured in clients/dataforth.md and projects/dataforth-dos.md; still warrants own system article for SBR config, MailProtector, per-client send connectors, and full routing detail | +| `system:neptune` | Medium | neptune.acghosting.com, 172.16.3.11 internal / 67.206.163.124 external — Exchange Server 2016; ACG infrastructure colocated at Dataforth D2; active mail server for multiple ACG-hosted clients. Context in clients/dataforth.md + projects/dataforth-dos.md. Warrants own article for SBR config, MailProtector, per-client send connectors, routing. Open items: cert was expiring 2026-05-31 (verify current status); DkimSigner disabled — see internal-infrastructure.md. | | `system:d2testnas` | Low | 192.168.0.9 — Linux (CachyOS?), SMB1 bridge for Dataforth DOS stations, rsync daemon port 873, hosts Neptune Exchange physically; key routing node for ACG-Dataforth connectivity; SSH root@192.168.0.9; also provides Tailscale 172.16.0.0/22 route | | `client:key-paul` | Low | GuruRMM enrolled (KEY-MEDIA); no session logs or docs found | -| `system:neptune` | Medium | URGENT: cert expires 2026-05-31; DkimSigner disabled; see internal-infrastructure.md for interim notes | diff --git a/wiki/projects/gururmm-agent.md b/wiki/projects/gururmm-agent.md new file mode 100644 index 0000000..ad91ff5 --- /dev/null +++ b/wiki/projects/gururmm-agent.md @@ -0,0 +1,488 @@ +--- +type: project +name: gururmm-agent +display_name: GuruRMM Agent +last_compiled: 2026-06-12 +compiled_by: GURU-5070/claude-main +sources: + - "gururmm@a794a7f: agent/src/main.rs" + - "gururmm@a794a7f: agent/src/commands/mod.rs" + - "gururmm@a794a7f: agent/src/transport/mod.rs" + - "gururmm@a794a7f: agent/src/metrics/mod.rs" + - "gururmm@a794a7f: agent/src/checks.rs" + - "gururmm@a794a7f: agent/src/updater/mod.rs" + - "gururmm@a794a7f: agent/src/bsod.rs" + - "gururmm@a794a7f: agent/src/watchdog/mod.rs" + - "gururmm@a794a7f: agent/src/inventory.rs" + - "gururmm@a794a7f: agent/src/users.rs" + - "gururmm@a794a7f: agent/src/tunnel/mod.rs" + - "gururmm@a794a7f: agent/src/event_log.rs" + - "gururmm@a794a7f: agent/src/compliance.rs" + - "gururmm@a794a7f: agent/src/discovery/mod.rs" + - "gururmm@a794a7f: agent/src/registry_ops/mod.rs" + - "gururmm@a794a7f: agent/src/vss.rs" + - "gururmm@a794a7f: agent/Cargo.toml" + - "gururmm@a794a7f: agent/agent.toml.example" + - "gururmm@a794a7f: server/migrations/ (059 migrations, filenames as capability timeline)" + - "gururmm@a794a7f: docs/FEATURE_ROADMAP.md" + - "gururmm@a794a7f: git log origin/main -- agent/ (recent 30 commits)" + - projects/gururmm-agent/session-logs/2026-05-25-recovered-review-fix-audit-2-remediation-branch-status.md + - projects/gururmm-agent/session-logs/2026-06-01-recovered-investigate-blue-screen-and-test-detection-featu.md + - projects/gururmm-agent/session-logs/2026-06-01-recovered-investigate-blue-screen-and-test-detection-featu-f5631414.md + - wiki/projects/gururmm.md (cross-reference) +backlinks: + - projects/gururmm +--- + +# GuruRMM Agent + +## Summary + +The GuruRMM agent is the endpoint component of the [[gururmm]] platform. It is a Rust binary that installs as a long-running system service on managed Windows, Linux, and macOS endpoints. The agent connects to the GuruRMM server over an authenticated WebSocket, reports system metrics, executes remote commands, manages self-updates, and runs a variety of monitoring and compliance tasks. + +**Current version:** 0.6.66 at `agent/Cargo.toml` HEAD (a794a7f). Fleet is converged to 0.6.63 stable as of 2026-06-11 (see [[gururmm]] for live fleet state). This article covers agent capabilities only — server, dashboard, and platform architecture are documented in [[gururmm]]. + +**Crate name:** `gururmm-agent`. Single binary; CLI subcommands select the operational mode (run, install, uninstall, start, stop, status, generate-config, watchdog, vss-snapshot, service, watchdog-service). + +--- + +## Capabilities / Feature Set + +### Monitoring and Telemetry + +**Periodic metrics** (default interval 60s, configurable via policy `ConfigUpdate`): + +| Metric | Notes | +|---|---| +| CPU usage % | Cross-platform via `sysinfo` | +| Memory used/total bytes + % | Cross-platform | +| Disk used/total bytes + % (primary disk) | Cross-platform | +| Network RX/TX bytes (delta) | Cross-platform | +| OS type, version, hostname | Cross-platform | +| Uptime seconds, boot time | Cross-platform | +| Logged-in user + idle seconds | Win: `GetLastInputInfo`; Linux: `xprintidle`; macOS: `CGEventSource` (verify coverage) | +| Public/WAN IP | Cached; fetched periodically from external service | +| Top 10 processes by CPU | Cross-platform via `sysinfo::Processes` | +| Top 10 processes by memory | Cross-platform | +| CPU package temperature | Via `sysinfo::Components`. **Windows: thermal zones depend on BIOS ACPI export — firmware-less coverage; LibreHardwareMonitor removed 2026-05-27 (CVE-2020-14979, WinRing0 Defender quarantine). Windows thermal collection is currently partial or None on most hosts.** | +| GPU temperature | Via `sysinfo::Components`. Same Windows caveat as above. | +| All hardware sensor readings (label, value, unit, critical threshold) | Via `sysinfo::Components`. Reliable on Linux; variable on Windows/macOS. | + +**Network state** (sent on connect and on interface change): per-interface IPv4/IPv6 addresses, MAC, derived CIDR subnets. + +**Log upload:** Agent log bundle sent every 12 hours (`LogUpload` message). The upload task uses a `watch::Sender` for the current WS sender so it never holds a stale handle from a prior connection. + +--- + +### Remote Execution + +Commands are dispatched by the server as `CommandPayload` messages over the WebSocket. + +**Command types** (from `transport::CommandType` enum): + +| Type | Notes | +|---|---| +| `shell` | `cmd.exe` on Windows, `bash` on Unix. Also accepts the alias `"cmd"` (backward compat — servers that send `command_type: "cmd"` historically triggered a parse failure that silently dropped the message; fixed in commit `3de9faf`). | +| `powershell` | Windows PowerShell. Also accepts the alias `"powershell"`. | + +Extended types (python, script, claude_task) are referenced in the parent article at [[gururmm]] and in `agent/src/scripts.rs` / `agent/src/claude.rs`; confirm against those modules for current state. + +**Execution contexts** (from `transport::CommandContext`, added migration 041): + +| Context | Behavior | +|---|---| +| `system` (default) | Runs in Session 0, in the service's own process context (LocalSystem on Windows). | +| `user_session` | **Windows only.** Impersonates the active logged-on user's desktop session via `WTSQueryUserToken` + `CreateProcessAsUserW` + per-user environment block. Requires an active console/RDS session. Implemented in `agent/src/watchdog/wts.rs`. Returns an error on non-Windows platforms. | + +**Command options:** `timeout_seconds` (optional), `elevated` (bool), `context` (defaulting to `system`). + +**In-flight management:** +- Individual cancellation: server sends `CancelCommand { command_id }`; agent aborts the Tokio `JoinHandle` immediately. +- All in-flight commands aborted on disconnect (`abort_all` called by WS reconnect loop). + +**Comms durability (Phase 1, shipped 2026-06-11 as v0.6.63):** +- Agent sends `CommandAck { command_id }` immediately on receipt of a dispatched command (before execution begins). Server stamps `acked_at` (migration 058). +- Re-delivery dedup: `CommandExecutor` keeps a FIFO-bounded cache of recently-completed results (capacity 64, max 256 KB per result). A re-delivered command that is already in-flight is ignored; one that already completed re-reports the cached result without re-executing. +- Result is cached BEFORE deregistering the running task (`record_result` -> `complete`) so a re-delivery in the race window always finds it cached. +- Server reaper re-delivers un-acked commands past a 60s ACK deadline (returns to `pending`) instead of failing them. Pending commands re-offered on every heartbeat (rides the refreshed NAT conntrack). Capability gate: reaper only re-delivers for agents that have demonstrably ACK'd at least once (partial index on `acked_at`); old agents keep the legacy fail-on-timeout path. + +--- + +### Hardware Inventory + +`agent/src/inventory.rs` — `HardwareInventory` struct, sent on connect and on server request (`InventoryReport` message). + +| Field group | Fields | +|---|---| +| System identity | manufacturer, model, serial_number, bios_version | +| CPU | cpu_model, cpu_cores, cpu_threads, cpu_speed_mhz | +| Memory | total_memory_mb | +| Disks | name, mount, total_gb, fs_type (Vec) | +| Network | name, IP, MAC, speed_mbps (Vec) | +| Software | name, version, publisher (Vec, installed applications) | +| Services | name, display_name, status, start_type (Vec) | +| OS detail | os_name, os_version, os_build | +| Windows OS product type | os_product_type: 1=Workstation, 2=DC, 3=Server (None on Linux/macOS) | +| Windows OS edition | os_edition: "Pro", "Standard", "Datacenter", etc. (None on Linux/macOS) | +| VM / container | is_virtual_machine, hypervisor_type, vm_uuid, is_hypervisor, hosted_vm_uuids, is_container, is_unraid | +| Agent version | agent_version | + +Collection uses platform-specific subprocess calls (`wmic`/`dmidecode`/`system_profiler`) and `sysinfo`. Fields are `Option` to avoid panics on platforms that cannot supply them. + +--- + +### User Inventory + +`agent/src/users.rs` — `UserInventory` struct, sent on connect and on server request (`UserInventoryReport` message). Policy-scheduled (default 24h interval). + +**Per-user fields:** username, display_name, account_type (local/domain/aad), enabled, password_never_expires, password_expired, last_logon, is_admin, email (AD), upn (AD), department (AD), group membership. + +**Collection per platform:** +- Windows: `Get-LocalUser`, `Get-ADUser` (if domain-joined), `dsregcmd /status` (Azure AD/hybrid). Group membership from `Get-LocalGroupMember`. DC detection (`is_dc`) from AD role query. +- Linux: `getent passwd` + `/etc/group`. +- macOS: `dscl`, `dsconfigad`, `dseditgroup`. + +**User actions** dispatched by server: enable/disable accounts, expire/un-expire password, create accounts, reset passwords. Executed via PowerShell on Windows, shell commands on Linux/macOS. Results reported as `UserActionResult`. + +--- + +### Checks + +`agent/src/checks.rs` — server-defined check configurations (`CheckPayload`) executed on demand or on schedule. + +| Check type | Implementation | +|---|---| +| `cpu` | `sysinfo` global CPU usage; value returned, threshold evaluated server-side | +| `memory` | `sysinfo` total/used memory; percentage computed | +| `disk` | `sysinfo` disks; usage percentage | +| `ping` | `tokio::process::Command` calls platform ping binary | +| `port` | `TcpStream::connect` with timeout | +| `script` | Arbitrary shell/script command; stdout/stderr/exit code captured | +| `service` | Win: `sc.exe query`; Linux: `systemctl is-active`; reports running/stopped | + +CPU, memory, and disk checks wrap blocking `sysinfo` calls in `spawn_blocking`. Unknown check types return status `failing` with an error message (no silent no-op). + +--- + +### Self-Update + +`agent/src/updater/mod.rs`. + +**Flow:** +1. Server sends `UpdatePayload` (version, download URL, SHA-256 checksum). +2. Agent downloads the binary via HTTPS (300s timeout). +3. SHA-256 checksum verified before touching the live binary. +4. Current binary copied to backup path (`gururmm-agent.backup` in config dir). +5. New binary atomically replaces current (via temp write + rename). +6. Agent restarts itself. +7. On reconnect, agent reports `UpdateResult` with outcome. If the agent fails to reconnect within ~180s rollback window, the backup binary is restored and the service restarts again. + +**Windows:** backup path `C:\ProgramData\GuruRMM\gururmm-agent.backup` (or `GuruRMM-Debug\` in debug builds). On restart, uses a detached child process that waits for parent exit before swapping. + +**Linux/macOS:** backup at `/etc/gururmm/gururmm-agent.backup`. + +**Update channels:** stable / beta / null (inherits default `stable`). Agent reports previous_version and pending_update_id in its next `Auth` payload so the server can correlate the update result. New builds are tagged `beta` by default; promotion to `stable` is a deliberate re-tag of the `.channel` sidecar file on the server's downloads directory. + +--- + +### Watchdog (Windows Only) + +`agent/src/watchdog/` — a second, separate Windows SCM service (`GuruRMMWatchdog`) that runs from the same binary via `gururmm-agent watchdog`. + +**Responsibilities:** +1. Polls SCM every 30s for `GuruRMMAgent`. On unexpected stop: restart with backoff 30s / 60s / 120s. After 3 failed attempts, POSTs a watchdog alert to the server and continues monitoring. +2. Serves a named-pipe IPC channel so the main agent can request a clean service restart without racing against its own SCM stop signal. +3. `watchdog/wts.rs`: WTS (Windows Terminal Services) token management for the `user_session` command context — acquires the active console session's user token, creates the child process with `CreateProcessAsUserW`, and provides the per-user environment block. + +The `ensure_watchdog_running()` function is called by the main agent on startup. It uses a two-step SCM privilege model: opens with `CONNECT` only first (least privilege); escalates to `CREATE_SERVICE` only if the service does not yet exist. + +**Platform:** The `run_watchdog_service()` entrypoint is a no-op stub on Linux/macOS (`#[cfg(not(windows))]` returns a warning). + +--- + +### Event Log Watches (Windows Only) + +`agent/src/event_log.rs` — added migration 047. + +**Mechanism:** Server pushes `EventLogWatchRule` configs as part of `ConfigUpdate`. The agent polls matching rules at a configured interval. Each rule specifies `log_name`, optional `event_id`, optional `source`, and optional `level` (critical/error/warning/information). Matches are queried via `Get-WinEvent` PowerShell with a `since` timestamp watermark. Matching entries are batched and sent as `EventLogMatches` messages. + +**Platform:** Windows only. `query_watch_rule` is `#[cfg(target_os = "windows")]`. Non-Windows builds compile with a stub that returns an empty `Vec`. + +**Injection guard:** `log_name` has single-quote characters stripped before inserting into the PowerShell filter hash. + +--- + +### BSOD / Kernel Crash Detection (Windows Only) + +`agent/src/bsod.rs` — added migration 048. Shipped v0.6.51 (June 2026). + +**How it works:** +- Runs one-shot on agent startup, then polls on a periodic interval. +- Enumerates `C:\Windows\Minidump\*.dmp`. +- Parses the kernel dump header by hand: + - 64-bit `PAGEDU64` (`DUMP_HEADER64`): bugcheck code at offset 0x38 (u32), 4 parameters at 0x40 (4x u64), system filetime at 0xFA8. + - 32-bit `PAGEDUMP` (`DUMP_HEADER`): bugcheck code at 0x28 (u32), 4 parameters at 0x2C (4x u32). + - The `minidump` crate is intentionally not used — it parses Breakpad MDMP format, not Windows kernel PAGEDU64 dumps. +- Cross-references the System event log (WER event 1001, Kernel-Power event 41) for faulting driver name and WER Report Id. +- Computes SHA-256 of each dump file. Already-seen hashes are stored in a watermark file (`C:\ProgramData\GuruRMM\bsod-seen.json`). Watermark writes use a tmp-then-rename atomic pattern to survive mid-write crashes. +- **First-run suppression:** On the first run (no watermark file), all existing dumps are baselined as already-seen — no retroactive alerts for crashes that predate the agent install. +- Sends one `BsodEvent` per newly-detected crash. Server-side: `bsod_events` table (migration 048), deduplicated by `(agent_id, dump_sha256)`, always Critical severity. Dashboard Crashes tab shipped 2026-06-07. + +**Platform:** `#[cfg(target_os = "windows")]`. Non-Windows compiles to an empty stub. + +**Validated against:** A real `0x116 VIDEO_TDR_FAILURE` (nvlddmkm.sys) on GURU-5070 during the 2026-06-01 session. + +--- + +### VSS Shadow Copy Management (Windows Only, SPEC-016) + +`agent/src/vss.rs` + `agent/src/compliance.rs` — migrations 050, 051. + +**Mechanism:** Driven entirely via PowerShell relay (`crate::powershell::run_ps()`); no native COM. Non-Windows platforms compile thin stubs. + +**Policy-driven snapshot scheduling:** +- The agent receives VSS policy as part of `ConfigUpdate`. It mirrors the policy to `C:\ProgramData\GuruRMM\vss-policy.json` on every `ConfigUpdate`. +- Snapshot execution is performed by a separate short-lived process invocation (`gururmm-agent vss-snapshot`), triggered by a Windows Scheduled Task named `GuruRMM-VSS-Snapshot`. The task is registered/updated by the agent when the policy hash changes. +- Shadow storage is always bounded: agent runs `vssadmin resize shadowstorage /maxsize=N%` — never left unbounded. +- Retention governed by max count and max age, as upper bounds on top of Windows' own storage-cap eviction. +- Per-volume first-run staggering: a `vss-firstrun-.flag` marker gates when each additional volume starts snapshotting (C: first, then others). + +**DeviceObject handle:** The `\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopyN` string is the stable identifier and is persisted at creation time. The trailing index `N` is not stable across reboots. + +**Compliance reporting (SPEC-025):** `agent/src/compliance.rs` implements `evaluate_all()`, which calls each policy domain's evaluator. VSS is Domain #1. Evaluation is **read-only** — it reports posture (`compliant`, `pending`, `non_compliant`, `not_applicable`) without mutating the machine. VSS is the only domain in v1 that performs opt-in, kill-switchable self-heal. A compliance batch is sent as `ComplianceReport`. Serde-tolerant: older servers ignore the variant. + +**Status reporting:** `VssStatus` message sent on a slow cadence — per-volume shadow storage usage for server cache and dashboard display. + +**Platform notes in code (TODO markers):** +- Linux: LVM snapshots (`lvcreate --snapshot`) — not implemented. +- macOS: APFS snapshots (`tmutil localsnapshot`) — not implemented. + +--- + +### Network Discovery + +`agent/src/discovery/mod.rs`. + +**How it works:** Server dispatches `DiscoveryScanConfig` (IP ranges, ports, timeout_ms, concurrency, exclusions). Agent expands ranges to individual IPs and probes: +1. TCP connect on each configured port (async, up to 200 concurrent probes via `Semaphore`). +2. ICMP ping fallback for hosts where all TCP ports are firewalled but the host is up. +3. ARP table lookup for MAC address. +4. Reverse DNS lookup for hostname. +5. OS fingerprinting from open port set. + +Results are streamed back as `DiscoveryResult` messages (one per found device) and finalized with `DiscoveryComplete` (total found, duration ms). + +--- + +### Registry Operations (Windows Only) + +`agent/src/registry_ops/` via the `winreg` crate. + +**Operations:** + +| Function | Notes | +|---|---| +| `enumerate_keys(path)` | Lists subkeys under a registry path | +| `enumerate_values(path)` | Lists values under a registry path | +| `read_value(path, name)` | Reads a single registry value | +| `set_value(path, name, value_type, value_data)` | Writes a registry value (typed, raw bytes) | +| `create_key(path)` | Creates a registry key | + +All operations return `anyhow::Error` on non-Windows platforms (compile-time stubs; no silent no-op). **Note from parent article [[gururmm]]: the HTTP API currently exposes read-only paths (enumerate, read_value); write paths exist in the agent but are not yet routed server-side. (verify current state)** + +--- + +### Tunnel + +`agent/src/tunnel/mod.rs`. Agent-side TunnelManager state machine. + +**Modes:** +- `Heartbeat` (default): periodic metrics and heartbeats. +- `Tunnel`: active session with a tech (triggered by `TunnelOpen { session_id, tech_id }` from server). Bidirectional data relay over the existing WebSocket connection. + +**Channel types defined in code:** `Terminal`, `File` (Phase 2+), `Registry` (Phase 2+), `Service` (Phase 2+). Currently only `Terminal` is operationally relevant. + +**Server status:** The server-side tunnel skeleton exists but is not production-ready (no `/tunnel` API routes declared, WS handler logs "not yet implemented"). Live TTY is planned as Phase 2 of the agent-comms-durability spec. + +--- + +### IPC and Tray Integration + +`agent/src/ipc.rs`. + +**Windows:** Named-pipe IPC server for the `GuruRMM Tray` companion process. +**Linux:** Unix socket IPC. + +**Operations available over IPC:** +- Subscribe/unsubscribe: tray receives `IpcStatusUpdate` broadcasts when WS connection state changes. +- Force check-in: tray requests immediate metrics collection (wakes the metrics task via `AppState::force_checkin` Notify). +- Per-section policy update: server's `ConfigUpdate` is relayed to the tray for display. + +IPC subscribers are tracked in `AppState::ipc_subscribers` (`RwLock>>`). + +--- + +## Architecture + +### Components + +| Component | Location | State | +|---|---|---| +| Agent service (Windows) | `C:\Program Files\GuruRMM\gururmm-agent.exe`, SCM service `GuruRMMAgent` | Deployed; 0.6.63 stable | +| Watchdog service (Windows) | Same binary, SCM service `GuruRMMWatchdog` | Deployed | +| Agent service (Linux) | `/usr/local/bin/gururmm-agent`, systemd `gururmm-agent.service` | Deployed | +| Agent service (macOS) | `/usr/local/bin/gururmm-agent`, LaunchDaemon `com.azcomputerguru.gururmm-agent.plist` | Phase 1 deployed 2026-05-12 | +| Tray (Windows) | Named-pipe IPC, separate binary | Deployed; BUG-020 ghost-icon fix in beta | +| Tray (Linux) | Unix socket IPC, libappindicator/GTK | Deployed (PR #13+#14, 2026-05-24) | +| Tray (macOS) | Menu bar stub | TODO (issue #18) | + +### Key Files and Repos + +- **Repo:** `azcomputerguru/gururmm`, internal Gitea at http://172.16.3.20:3000 +- **Submodule (dev):** `D:\claudetools\projects\msp-tools\guru-rmm` +- **Agent source:** `agent/src/` within repo +- **Windows config dir:** `C:\ProgramData\GuruRMM\` (service files, device_id, BSOD watermark, VSS policy cache, VSS first-run flags) +- **Windows registry:** `HKLM\SOFTWARE\GuruRMM\SiteId` (set by MSI), `HKLM\SOFTWARE\GuruRMM\DeviceId` (set by agent, Phase 1 durable identity) +- **Linux config:** `/etc/gururmm/agent.toml` (root, mode 600); `/var/lib/gururmm/.device-id` + `/etc/gururmm/.device-id` (durable identity mirrors) +- **macOS config:** `/usr/local/etc/gururmm/site.plist` (site_id/agent_key via plist crate) +- **Downloads dir (server):** `/var/www/gururmm/downloads/` on 172.16.3.30 — agent binaries + `.channel` sidecars + `.sha256` checksums + +### Build Variants + +| Artifact | Platform | Notes | +|---|---|---| +| `gururmm-agent-windows-amd64-.exe` | Windows 10+/Server 2016+ (64-bit) | Native Windows Service (windows-service crate) | +| `gururmm-agent-windows-x86-.exe` | Windows 10+/Server 2016+ (32-bit) | Same, 32-bit | +| `gururmm-agent-windows-legacy-amd64-.exe` | Windows 7/Server 2008 R2 (64-bit) | `legacy` feature flag; no windows-service dependency; NSSM-based service | +| `gururmm-agent-windows-legacy-x86-.exe` | Windows 7/Server 2008 R2 (32-bit) | Same, 32-bit | +| `gururmm-agent-base-.msi` | Windows (all) | WiX v4 MSI installer; SITEKEY baked per-site on download | +| `gururmm-agent-linux-amd64-` | Linux (x86_64) | musl static; systemd service | +| `gururmm-agent-macos-amd64-` | macOS (Intel) | Mach-O, LaunchDaemon | +| `gururmm-agent-macos-arm64-` | macOS (Apple Silicon) | Same, arm64 | + +macOS builds are manual (no CI pipeline; no build host); tagged `.channel` files for macOS are managed manually. (verify) + +### Platform Coverage + +| Capability | Windows | Linux | macOS | +|---|---|---|---| +| Metrics (CPU/mem/disk/net) | Full | Full | Full | +| Temperature sensors | Partial (ACPI thermal only; LHM removed) | Full (hwmon) | Partial (SMC; Apple Silicon inconsistent) | +| Hardware/software/service inventory | Full | Full | Full | +| User/group inventory | Full (local + AD + AAD) | Full (getent) | Full (dscl/dsconfigad) | +| Checks (cpu/mem/disk/ping/port/script) | Full | Full | Full | +| Service checks | Full | Full | (verify) | +| Shell command execution | Full (cmd.exe) | Full (bash) | Full (bash) | +| PowerShell execution | Full | N/A | N/A | +| `user_session` execution context | Full (WTS impersonation) | Not implemented | Not implemented | +| Self-update | Full | Full | Full | +| Watchdog SCM supervision | Full | N/A (systemd handles) | N/A (launchd handles) | +| BSOD detection | Full | N/A | N/A | +| Event Log watches | Full (Get-WinEvent) | N/A | N/A | +| VSS shadow copy management | Full (SPEC-016) | Planned (LVM) | Planned (APFS) | +| Registry operations | Full | Error stub | Error stub | +| Network discovery | Full | Full | Full | +| Tunnel (terminal) | Partial (agent side built; server not production-ready) | Partial | Partial | +| Compliance reporting | Full (VSS domain) | N/A (VSS only in v1) | N/A (VSS only in v1) | +| Tray IPC | Full (named pipe) | Full (Unix socket) | Stub | + +--- + +### Communication Protocol + +The agent communicates with the server over a persistent TLS WebSocket (`wss://`). All messages are JSON-serialized using a tagged enum (`type`/`payload` fields, `snake_case`). + +**Agent -> Server message types** (`AgentMessage`): Auth, Metrics, NetworkState, CommandResult, WatchdogEvent, UpdateResult, Heartbeat, LogUpload, CommandCancelled, CommandAck, TunnelReady, TunnelData, TunnelError, ScriptResult, RequestChecks, InventoryReport, UserInventoryReport, UserActionResult, CheckResult, DiscoveryResult, DiscoveryComplete, RegistryResult, EventLogMatches, BsodEvent, VssResult, VssStatus, ComplianceReport. + +**Server -> Agent message types** (`ServerMessage`): Command, ConfigUpdate, Update, Ack, Error, RequestLogUpload, CancelCommand, TunnelOpen, TunnelClose, TunnelData, RequestInventory, UserAction, RunChecks, DiscoveryScan, RegistryOp, VssOp. + +**Serde tolerance:** Newer message types (VssStatus, ComplianceReport) are variants the server may ignore if it is running an older version. Unrecognized `ServerMessage` variants are NAK'd with an error response rather than silently dropped (agent: commit `3de9faf`). + +**Auto-reconnect:** The WS client loop reconnects on disconnect with exponential backoff. On reconnect: pending commands are re-dispatched, pending updates re-offered, agent requests current check configs. + +--- + +## Development + +### Current Focus + +As of 2026-06-12 (agent 0.6.66 HEAD / fleet on 0.6.63 stable): + +- **Agent-comms-durability Phase 2 (planned):** Live TTY over WS — seq/resume frames, single-use session token, AIMD keepalive. Not yet started. +- **Durable agent identity Phase 1 Tasks 2-3 (pending):** Hardware fingerprint capture (`inventory.rs` baseboard serial + primary MAC); server migration for `hardware_fingerprint`; dashboard duplicate-hostname surfacing (read-only). +- **BSOD Phase 2/3 (deferred):** BSOD events in the Alerts stream, on-demand dump upload (`fetch_bsod_dump`), full ~350-entry bugcheck name table (Phase 1 ships a 10-code map). +- **Windows thermal collection:** WMI ACPI (`MSAcpi_ThermalZoneTemperature`) recommended as first unblocked path (Approach 1 in FEATURE_ROADMAP.md). NVAPI (NVIDIA GPU temps) as Approach 2. Custom kernel driver deferred. +- **Tray IPC peer authorization** (Windows issue #16), logind console-user resolution (issue #17), macOS tray (issue #18), subscriber broadcast (issue #19). +- **Linux fleet unit drift:** Auto-updater replaces binary but does NOT refresh the systemd unit file. Pre-BUG-016-fix agents have new binary + old unit missing `StateDirectory=gururmm`. Needs an ops-script pass. +- **VSS Linux/macOS:** Stubs remain; LVM (Linux) and APFS (macOS) snapshots are design-level TODOs only. + +### Patterns and Anti-Patterns + +**Never repeat:** + +| Pattern | What Went Wrong | +|---|---| +| Using the `minidump` crate for Windows kernel dumps | Parses only Breakpad MDMP format; Windows kernel PAGEDU64 dumps require direct offset reads from DUMP_HEADER64. | +| `command_type: "cmd"` sent from server without agent alias | Agent did not recognize "cmd" as the Shell variant; command message was silently dropped instead of executed. Fixed commit `3de9faf`. | +| `Restart-Service GuruRMMAgent -Force` in a remote command | Kills the agent before it can report the command result; command stays in `running` state forever. Use a scheduled task with a delay. | +| LHM (LibreHardwareMonitor) for Windows thermal | WinRing0 kernel driver (CVE-2020-14979); Defender quarantined it fleet-wide. Do not re-add. | +| Installer using `& $stagingPath install 2>&1 \| Out-Null` | Swallows all output under `$ErrorActionPreference='Stop'`; surfaces misleading NativeCommandError on non-zero exit. Use `Start-Process -Wait -PassThru` + explicit ExitCode check. Fixed commit `5c0d004`. | +| Agent-level channel pin for a beta canary | Agent `update_channel` is lost on re-enrollment. Use site-level or client-level channel override — they survive re-enrollment. | +| New agent builds tagged stable by default | Races the entire fleet to auto-update before any beta soak. All new builds default to beta; promotion to stable requires explicit re-tag of the `.channel` sidecar. | +| Reaper failing un-acked commands on timeout | False failures for commands black-holed by NAT conntrack gap. Reaper must only fail commands that were ACK'd but exceeded real execution timeout. Un-acked commands requeue to `pending`. | +| `+1.77` legacy builds without `--ignore-rust-version` | Fail MSRV check after adding `rust-version` to Cargo.toml. Legacy build lines need `--ignore-rust-version`. | +| CRLF line endings in migration SQL files | sqlx SHA-384 checksum mismatch crashes server on start. `.gitattributes` + `core.autocrlf=false` + pre-commit hook prevents this. | +| `git config --system --add safe.directory` omission when building as root | Webhook builds run as root on guru-owned repo; git rejects repo as dubious ownership without this setting. Fixed 2026-06-11. | + +**Good patterns:** + +- **Platform parity rule:** Any agent feature ships on Windows + Linux + macOS in the same commit. Stubs with `// TODO(platform): ` are acceptable; silent no-ops are not. +- **Serde-tolerant messages:** New `AgentMessage` / `ServerMessage` variants must not break older server/agent versions. Use `#[serde(default)]` on new fields; new enum variants are simply ignored by old deserializers. +- **SHA-256 watermark atomic write (bsod, vss):** Always write to a `.tmp` file then `rename` over the target to avoid corrupt-on-crash. +- **CommandAck before execution:** ACK is sent on RECEIPT, not on completion. The server can distinguish "never reached agent" from "still running" based on `acked_at`. +- **`record_result` before `complete`:** Caching the result before deregistering the task handle ensures a racing re-delivery always finds the command in one of: running (ignore), completed+cached (re-report), or about-to-run (de-dup before spawn). Never in a "finished but not yet cached" gap. + +### Build and Deploy + +- **Build trigger:** Push to `gururmm` Gitea main branch fires a Gitea webhook to `http://172.16.3.30:9000/webhook` → `webhook-handler.py`. Detects which component changed (agent vs. server) via `last-built-commit-{linux,windows,mac,server}` marker files. +- **Linux agent:** `build-linux.sh` on the build server; musl static binary. +- **Windows agent:** `build-windows.sh` dispatches SSH to Beast (GURU-BEAST-ROG, i9-14900K, primary) or Pluto (172.16.3.36, fallback). MSVC + WiX v4. Legacy builds use `--ignore-rust-version`. Signing via jsign + Azure Trusted Signing (Arizona Computer Guru LLC cert). Binary tagged `beta` in downloads `.channel` sidecar. +- **macOS agent:** Manual build; `build-macos.sh` / `build-macos-pkg.sh` on an Apple machine. No CI pipeline yet. (verify) +- **Promotion to stable:** `POST /api/updates/rollouts/:version/promote` body `{"os","arch"}` re-tags `.channel` sidecars. Rollback: `POST /api/updates/rollouts/:version/rollback`. This is intentionally a manual gate — no automated health-gated promotion yet (Phase 2 of safe-rollout spec, migration 046, is written but unwired). +- **Cargo.toml version pinning:** Several crates pinned for Rust 1.77 legacy-build compatibility (edition 2024 crates, MSRV bumps). See `agent/Cargo.toml` comments for rationale. New deps must not pull in edition-2024 or MSRV >1.77 crates if legacy builds are still required. + +--- + +## Active State + +Fleet on agent 0.6.63 stable as of 2026-06-11. ~168-182 agents typically online (215 enrolled). HEAD at a794a7f is v0.6.66 (not yet released). See [[gururmm]] Summary section for live fleet state, server version, and recent deployment notes. + +--- + +## History Highlights + +- **2025-12-15:** Initial agent: WebSocket transport, basic metrics (CPU/mem/disk/net), command execution (shell/PowerShell), self-updater with rollback. +- **2026-04-19:** Temperature collection added (sysinfo Components); checks engine (cpu/mem/disk/ping/port/script/service). +- **2026-04-29:** Hardware inventory, software inventory, service inventory; VM/container/Unraid detection. +- **2026-05-12:** macOS agent Phase 1 deployed (LaunchDaemon, plist-based config, cross-compiled amd64/arm64). +- **2026-05-15:** User/group inventory (migration 037-040); DC detection; domain/AAD classification. +- **2026-05-17:** Network discovery (TCP + ICMP + ARP + rDNS + OS fingerprint, concurrent probes). +- **2026-05-19:** `user_session` execution context (migration 041) — WTS token impersonation for active user desktop on Windows. +- **2026-05-21:** Agent events table (migration 042); interrupted command status (migration 043). +- **2026-05-24:** Linux tray (PRs #13+#14, libappindicator/GTK + Unix socket IPC). +- **2026-05-25 (audit-2-remediation):** BUG-002 crash-detection dead code fixed (re-keyed to `update_success`). BUG-003 build-server.sh hardened (build lock + binary backup + auto-rollback). `update_channel` added to all agent API responses. +- **2026-05-27:** LibreHardwareMonitor removed fleet-wide — WinRing0 driver flagged by Microsoft Defender (CVE-2020-14979). Windows temperature collection reduced to ACPI/WMI partial coverage only. +- **2026-05-27:** Event log watches implemented (migration 047); BSOD detection spec written. +- **2026-06-01:** BSOD detection shipped (migration 048, agent v0.6.51). Validated against real `0x116 VIDEO_TDR_FAILURE` (nvlddmkm.sys) on GURU-5070. First-run suppression; SHA-256 atomic watermark; DUMP_HEADER64 hand-parser. +- **2026-06-04:** VSS shadow copy management merged (SPEC-016, migration 050); SPEC-025 compliance posture (migration 051, VSS domain #1). +- **2026-06-04:** BUG-020 tray ghost/duplicate icons fixed (commit `137dd85`); fix in beta. +- **2026-06-07:** Durable agent identity Phase 1 Task 1 (commit `0b81d33`, v0.6.62): registry + file mirror for device_id; `cleanup.ps1` whitelisted identity files. Addresses ~99% of new ghost-agent creation. +- **2026-06-07:** `command_type: "cmd"` alias added + unparseable commands NAK'd instead of silently dropped (commit `3de9faf`, v0.6.63-pre). +- **2026-06-11:** Agent Comms Durability Phase 1 shipped (v0.6.63): `CommandAck` on receipt, re-delivery dedup cache, server reaper re-queues un-acked commands, pending commands re-offered on every heartbeat. Verified at PST-SERVER (behind UDR Ultra NAT). + +--- + +## Backlinks + +- [[gururmm]] — parent project article (server, dashboard, build pipeline, full architecture, deployment state) +- [[gururmm-build]] — the production server host at 172.16.3.30 where agent binaries are built and served diff --git a/wiki/projects/msp-tools.md b/wiki/projects/msp-tools.md new file mode 100644 index 0000000..25d1e63 --- /dev/null +++ b/wiki/projects/msp-tools.md @@ -0,0 +1,123 @@ +--- +type: project +name: msp-tools +display_name: MSP Tools (umbrella) +last_compiled: 2026-06-12 +compiled_by: GURU-5070/claude-main +sources: + - projects/msp-tools/session-logs/2026-06-02-recovered-build-self-diagnosis-skill-for-claudetools-setup.md + - projects/msp-tools/session-logs/2026-06-02-recovered-build-self-diagnosis-skill-for-claudetools-setup-25f5d8b9.md + - projects/msp-tools/README.md + - .claude/memory/project_gururmm.md + - .claude/memory/project_guruconnect.md + - .claude/memory/reference_msp_audit_scripts.md +backlinks: + - projects/gururmm + - projects/guruconnect +--- + +# MSP Tools (Umbrella) + +## Summary + +`projects/msp-tools/` is the umbrella directory for ACG's MSP tooling under ClaudeTools. It hosts two active product submodules ([[gururmm]] and GuruConnect), plus supporting PowerShell modules, audit scripts, and operational utilities. The directory was originally populated in January 2026 by importing conversation archives from prior Claude projects; active development now lives in the submodules and supporting directories with their own session-log trees. + +This article covers the umbrella scope only. For the full capability record of each product, see the dedicated articles linked below. + +--- + +## Sub-Projects + +| Directory | Type | Purpose | Wiki Article | +|---|---|---|---| +| `guru-rmm/` | git submodule | Full-stack RMM: agent fleet, monitoring, remote execution, dashboard | [[gururmm]] | +| `guru-connect/` | git submodule | Remote access / screen sharing product (Rust; native-first, WebRTC fallback) | no article yet (verify) | +| `guru-scan/` | PowerShell module | Multi-engine malware scan orchestrator; runs RKill + scanner chain, outputs JSON for GuruRMM agent | — | +| `msp-audit-scripts/` | PowerShell scripts | On-demand server and workstation audit scripts; run via ScreenConnect Toolbox as SYSTEM | — | +| `quote-wizard/` | PHP/web app | MSP quoting tool (PHP API + frontend + admin panel) — maturity/status (verify) | — | +| `scripts/` | Misc scripts | M365 onboarding (CIPP templates, tenant onboarding, permission manifests), Datto SmartBadge diagnostics, RMM status check | — | +| `utilities/` | Standalone PS1 | Printer port cleanup, Win11 upgrade script, saved ScreenConnect Toolbox one-liners | — | +| `toolkit/` | (empty README) | Purpose (verify) | — | + +**Archived conversation imports (not active code):** +- `guru-rmm-conversation-logs/` — 54 JSONL transcripts from the original GuruRMM Claude project, imported 2026-01-17 +- `guru-connect-conversation-logs/` — 40 JSONL transcripts from the original GuruConnect Claude project, imported 2026-01-17 + +--- + +## Architecture + +### Components +| Component | Location | Tech | State | +|---|---|---|---| +| GuruRMM | `projects/msp-tools/guru-rmm/` (submodule) | Rust (server + agent), TypeScript (dashboard), PostgreSQL | Active — see [[gururmm]] | +| GuruConnect | `projects/msp-tools/guru-connect/` (submodule) | Rust (server + Windows agent), TypeScript (dashboard), PostgreSQL | Active — v2 in dev | +| GuruScan | `projects/msp-tools/guru-scan/` | PowerShell module | Active | +| MSP Audit Scripts | `projects/msp-tools/msp-audit-scripts/` | PowerShell | Active; mirrored on `Howweird/msp-audit-scripts` (GitHub) | +| Quote Wizard | `projects/msp-tools/quote-wizard/` | PHP + frontend | Status (verify) | + +### Key Files & Repos +- **GuruRMM repo:** `azcomputerguru/gururmm` on Gitea (172.16.3.20) +- **GuruConnect repo:** `azcomputerguru/guru-connect` on Gitea (172.16.3.20) — separate repo, NOT a submodule of ClaudeTools +- **MSP Audit Scripts remote:** `https://raw.githubusercontent.com/Howweird/msp-audit-scripts/master/` +- **GuruScan config:** `projects/msp-tools/guru-scan/scanners.json` + +--- + +## Development + +### Current Focus + +Active development is in the submodules. See [[gururmm]] for GuruRMM's current sprint. GuruConnect is undergoing v2 architecture (native-first remote access, Rust relay/auth rebuild, bidirectional clipboard file transfer). The other tools (GuruScan, audit scripts, utilities) are maintained but not under active sprint. + +### Patterns & Anti-Patterns + +- GuruRMM and GuruConnect are each versioned products with their own CI/CD pipelines, changelogs, and session-log trees. Do not conflate umbrella session logs with product session logs. +- The `README.md` at the umbrella root is stale (dated 2026-01-17; describes the directory as a conversation-import archive — that framing is obsolete). Do not rely on it for current state. +- `guru-rmm-conversation-logs/` and `guru-connect-conversation-logs/` are read-only import archives. Development context lives in the submodule repos and root `session-logs/`. + +### Build & Deploy + +Each product has its own build and deploy procedure. See [[gururmm]] for GuruRMM. For GuruConnect, see the deploy procedure in `.claude/memory/project_guruconnect.md` (manual build on server `172.16.3.30`, login shell required for cargo/protoc on PATH). + +--- + +## Own Session Logs — Work Captured + +The `projects/msp-tools/session-logs/` directory holds two recovered log files, both reconstructed from the same transcript (`25f5d8b9`, 2026-06-02T21:14–22:06 UTC). They cover building the `/self-check` ClaudeTools harness diagnostic skill — this is ClaudeTools infrastructure work, not an MSP tool feature. The logs were saved here rather than root `session-logs/`; the reason is unclear (verify if misclassified). + +**Work done in those sessions (2026-06-02):** + +- Assessed existing ClaudeTools diagnostic infrastructure; determined a new harness self-check skill was needed. +- Created `manifest.json` (baseline, v1.0.0 provisional) as the single source of truth for all machines — required tools vs capability-gated tools, per-tier rules. +- Created `self-check.sh` (cross-platform bash; runs probes, grades machine, publishes census to coord API). +- Created command wrapper at `.claude/commands/self-check.md` and skill docs at `.claude/skills/self-check/`. +- First run on GURU-5070 (windows/amd64): **Grade AMBER** — PASS 71, WARN 2, FAIL 0. +- Resolved: `jq` (winget build) emitting `\r\n` line endings causing false negatives in hook/path resolution; patched with a `jq` wrapper. +- Found: `post-commit` hooks not installed despite `HOOKS.md` mandate; found `autotask.md` missing on Linux machine. + +**Pending from those sessions (unresolved as of log date):** +- Manifest to be moved from provisional to ratified status. +- Validate `--json` output and graceful `aggregate` degradation. +- Mandatory code review for the self-check engine before committing. + +--- + +## Active State + +See [[gururmm]] for GuruRMM's live state. GuruConnect v2 in development; v1 prod at `connect.azcomputerguru.com` (172.16.3.30:3002 behind NPM). Self-check skill status: AMBER on GURU-5070 as of 2026-06-02; fleet aggregate status (verify current via coord API `selfcheck_*` components under `claudetools` project key). + +--- + +## History Highlights + +- **2026-01-17** — Original import of guru-rmm (54 files, 14 MB) and guru-connect (40 files, 6.1 MB) conversation archives into `projects/msp-tools/` +- **2026-05-30** — GuruConnect v2 deployed to production at `connect.azcomputerguru.com` +- **2026-06-02** — `/self-check` harness skill built; GURU-5070 grades AMBER on first run + +--- + +## Backlinks + +- [[gururmm]] — GuruRMM product article (full capability record) +- GuruConnect wiki article not yet created (verify — slug would be `guruconnect`)