sync: auto-sync from GURU-5070 at 2026-06-25 12:35:22
Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-06-25 12:35:22
This commit is contained in:
@@ -2,6 +2,7 @@
|
||||
|
||||
## Reference
|
||||
- [ACG resource map](reference_resource_map.md) — **READ THIS FIRST** when a task references a server/service/tenant/API. What we have access to, how to connect from this machine, per-machine exceptions, gotchas. Points at the detail files below.
|
||||
- [Tailscale subnet-route key expiry](reference_tailscale_subnet_key_expiry.md) — "internet OK but all of 172.16.3.x (Gitea .20, RMM/coord .30) dead" = Tailscale infra-node KEY EXPIRY (pfSense subnet router advertises 172.16.0.0/22), NOT a LAN outage; expiry now disabled on infra nodes (2026-06-25). Fallback: gururmm-server direct at tailnet 100.86.12.15:3001.
|
||||
- [GravityZone support center](reference_gravityzone_support.md) — Authoritative Bitdefender GravityZone product + Public API docs; use to confirm UNVERIFIED `bitdefender` skill methods/param shapes (push setPushEventSettings, assignPolicy, report/account writes, maintenancewindows/integrations names).
|
||||
- [GURU-5070 Rust toolchain](reference_guru5070_rust_toolchain.md) — GURU-5070 now has cargo + MSVC + protoc; build/clippy/test guru-connect LOCALLY (set PROTOC to the winget path) instead of the build host. CI only clippy-checks the Linux server, not the Windows agent.
|
||||
- [ACG Office Network Infrastructure](infra_office_network.md) — IPs/hosts/roles for pfSense/Jupiter/VMs/Docker. Check before assuming; .21 (Uranus) is storage.
|
||||
|
||||
32
.claude/memory/reference_syncro_agent_handle_leak.md
Normal file
32
.claude/memory/reference_syncro_agent_handle_leak.md
Normal file
@@ -0,0 +1,32 @@
|
||||
---
|
||||
name: reference_syncro_agent_handle_leak
|
||||
description: RDS "no available computers in the pool" (0x3/0x408) can really be a SyncroLive.Agent.Runner handle leak starving the box. How to spot + fix.
|
||||
metadata:
|
||||
type: reference
|
||||
---
|
||||
|
||||
**Symptom chain that hid the real cause (IMC1 / Instrumental Music Center, 2026-06-25):**
|
||||
RemoteApp/RDP launch fails → *"There are no available computers in the pool"* (RDP **error 0x3 / extended 0x408**).
|
||||
The RD Connection Broker **Admin** log (`Microsoft-Windows-TerminalServices-SessionBroker/Admin`,
|
||||
**Event 802**) shows the truth: *"RD Connection Broker failed to process the connection request …
|
||||
Error: **Insufficient system resources** exist to complete the requested service."* The
|
||||
SessionBroker-Client log shows 1296 "Element not found" / 1306 redirect-failed. The collection +
|
||||
session host are healthy (`NewConnectionAllowed: Yes`), so the broker isn't the bug — the **box is
|
||||
out of a kernel resource**.
|
||||
|
||||
**Root cause:** the **Syncro RMM agent `SyncroLive.Agent.Runner` was leaking HANDLES** — 1,135,414
|
||||
handles in one process (~80% of the box's 1.41M total). Handle/object exhaustion → broker can't
|
||||
create the session.
|
||||
|
||||
**Diagnose:** `Get-Process | Sort-Object Handles -Descending | Select -First 6 Name,Handles,Id` and
|
||||
`(Get-Process | Measure-Object Handles -Sum).Sum`. Memory looks fine (it's handles, not RAM/commit).
|
||||
Services on the box: `Syncro`, `SyncroLive`, `SyncroOvermind` (SyncroRecovery).
|
||||
|
||||
**Fix (no reboot needed):** `Stop-Process -Name 'SyncroLive.Agent.Runner' -Force` — the Syncro
|
||||
watchdog respawns it clean (dropped 1.41M → 280K handles, runner came back at ~900). Have the user
|
||||
retry the RemoteApp immediately.
|
||||
|
||||
**It recurs** (leak accumulates over uptime) → schedule a periodic SyncroLive restart and/or update the
|
||||
agent; **likely fleet-wide** — sweep other client servers for high-handle `SyncroLive.Agent.Runner`
|
||||
(deferred 2026-06-25). IMC1 also had a separate pending reboot (wedged KB5075999) + expired RDS certs.
|
||||
See [[reference_resource_map]].
|
||||
27
.claude/memory/reference_tailscale_subnet_key_expiry.md
Normal file
27
.claude/memory/reference_tailscale_subnet_key_expiry.md
Normal file
@@ -0,0 +1,27 @@
|
||||
---
|
||||
name: reference_tailscale_subnet_key_expiry
|
||||
description: "Internet OK but all of 172.16.3.x dead" = Tailscale infra-node key expiry, not a LAN outage. How to diagnose + the fallback path.
|
||||
metadata:
|
||||
type: reference
|
||||
---
|
||||
|
||||
The ACG internal subnet **172.16.3.x is reached over Tailscale**, not a local LAN — `pfsense-2`
|
||||
(the pfSense node) is the **subnet router** advertising **172.16.0.0/22**. Key hosts on it:
|
||||
Gitea/Jupiter `172.16.3.20:3000`, GuruRMM + coord `172.16.3.30:3001`/`:8001`.
|
||||
|
||||
**Symptom → cause:** if `sync.sh` fetch fails and the WHOLE `172.16.3.x` subnet is unreachable
|
||||
(both .20 and .30) **while general internet is fine**, the cause is almost always a **Tailscale
|
||||
node KEY EXPIRY** on an infra node (the subnet router or a server) — an expired key drops that node
|
||||
off the tailnet, killing the route. It is NOT a "transient blip" and NOT a real LAN outage (logged
|
||||
as a correction 2026-06-25 after I mis-called it). Mike **disabled key expiration** on the infra
|
||||
node(s) 2026-06-25 so it shouldn't recur; if it does, re-auth the node + confirm expiry is off in the
|
||||
Tailscale admin console.
|
||||
|
||||
**Diagnose (Windows `tailscale.exe` at `C:\Program Files\Tailscale\`):**
|
||||
- `tailscale status` — look for peers marked `offline`/key-expired, esp. `pfsense-2` and `gururmm-server`.
|
||||
- `tailscale debug prefs | grep RouteAll` — must be `true` (this machine accepts subnet routes).
|
||||
- `tailscale status --json` — confirm a peer advertises `172.16.0.0/22` (PrimaryRoutes) and is `Online`.
|
||||
- `tailscale ping <tailnet-100.x>` — tests tailnet path independent of the subnet route.
|
||||
|
||||
**Fallback:** `gururmm-server` is directly reachable at its **tailnet IP `100.86.12.15:3001`** — usable
|
||||
in place of `172.16.3.30:3001` if the subnet route is down but the node itself is up. See [[feedback_tmp_path_windows]].
|
||||
Reference in New Issue
Block a user