sync: auto-sync from HOWARD-HOME at 2026-06-21 16:14:48
Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-21 16:14:48
This commit is contained in:
@@ -0,0 +1,76 @@
|
||||
## User
|
||||
- **User:** Howard Enos (howard)
|
||||
- **Machine:** Howard-Home
|
||||
- **Role:** tech
|
||||
|
||||
## Session Summary
|
||||
|
||||
Continued the `bitdefender` skill work (after the earlier API build-out session) by running a full live integration test of the GravityZone control surface against a real endpoint, then tearing it down. The goal Howard set: prove every option/report/feature works on a test machine, not just probe param shapes.
|
||||
|
||||
Located the test machine via `rmm-search` (RMM-TEST-MACHINE, AZ Computer Guru / Howard-VM site, online agent `99d6d692-99e0-4359-9f9c-f43be89f49e5`; a stale offline re-enrollment duplicate `7d3456f5` also exists). Created a GravityZone test group (ZZ-RMM-TEST) and a test install package (ZZ-RMM-TEST-PKG) under the ACG company `5c428b246c031893678b4569`, and pulled install links.
|
||||
|
||||
The Bitdefender install was the hard part and drove several findings. The lightweight setup-downloader stub FAILS when run as SYSTEM via the RMM (exit 3, 0-byte installer.xml) and triggers a UAC prompt when run in `context: user_session` — both unacceptable for unattended deployment. Root cause: the stub needs an interactive elevated session and an external CDN fetch. The fix (matching how Syncro does it) is the OFFLINE FULL KIT (epskit_x64.zip, ~696MB) run by the SYSTEM-level agent: self-contained (no CDN fetch), already elevated (no UAC). The full kit needs API-key auth to download, so to keep the key off the endpoint and out of RMM command history, the kit was staged server-side: the GuruRMM server (its own root RMM agent) downloaded it into `/var/www/gururmm/downloads/bdkit-test.zip` (served anonymously over http), the endpoint pulled it keyless via BITS, extracted, and ran `epskit_x64.exe /bdparams /silent` under Task Scheduler (fire-and-forget). The BD API key that briefly appeared in RMM command_text during staging was then scrubbed from the RMM Postgres `commands` table (verified 0 commands expose secrets). Install succeeded (INSTALL EXIT=0, ~8.5 min) and the endpoint enrolled (managed=True).
|
||||
|
||||
With BD installed, exercised the full control surface live and fixed 5 doc-vs-live param-shape bugs the testing exposed. Then tore everything down: managed-uninstalled BD via `deleteEndpoint` (deleting a managed endpoint auto-creates an uninstall-client task — the GravityZone-initiated removal that bypasses tamper protection, which had blocked the local uninstall tool), deleted the test group + package, removed the staged kit, cleaned endpoint temp/tasks, and rebooted to finalize. selftest 75/75 throughout; all fixes committed and auto-synced.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- Install method = OFFLINE FULL KIT run as SYSTEM, not the downloader stub. The stub fails headless (exit 3) and needs UAC in a user session; the full kit is self-contained + SYSTEM-elevated. This is how Syncro/real RMMs deploy BD.
|
||||
- Stage the kit server-side (GuruRMM downloads dir, anonymous http) so the endpoint pulls keyless. Keeps the GZ API key off the client endpoint and out of RMM command history — mirrors Syncro keeping credentials server-side. Scrubbed the transient key from the RMM DB afterward.
|
||||
- Long endpoint ops (install) run fire-and-forget via Task Scheduler, verified by OUTCOME (GZ enrollment, log files), never live-polled. Per Howard's directive "this needs to happen without us monitoring it."
|
||||
- BD removal = GravityZone-initiated (`deleteEndpoint` -> auto uninstall-client task), not the local uninstall tool. The local BEST_uninstallTool.exe was blocked (exits instantly, no effect) by tamper/self-protection + session-0; the managed uninstall is server-authorized and bypasses that.
|
||||
- Did NOT execute write methods that would alter real partner-tenant data (create/delete client companies, console users, notification settings, the one real quarantine item on production DC ACG-DC16). Those stay param-shape + gating verified; executing them just to "tick a box" would create real junk/risk.
|
||||
|
||||
## Problems Encountered
|
||||
|
||||
- BD downloader stub exit 3 as SYSTEM / UAC in user_session -> switched to offline full kit (see decisions).
|
||||
- Full kit needs auth + endpoint link to office server is slow (~3 Mbps, 696MB) and exceeded RMM command timeouts -> staged server-side + pulled via BITS (resumable, survives timeouts).
|
||||
- BITS download completed but file not finalized (BITS holds temp until Complete-BitsTransfer) -> called Complete-BitsTransfer explicitly.
|
||||
- BD API key landed in RMM command_text during server-side staging -> redacted all matching `commands` rows via Postgres (peer-auth `sudo` blocked by agent no-new-privileges; used TCP `PGPASSWORD` instead; self-redacting pass removed the PG password too).
|
||||
- Local BEST_uninstallTool.exe did nothing (tamper/self-protection + session-0) -> used GravityZone-initiated uninstall via deleteEndpoint.
|
||||
- 5 doc-vs-live param mismatches (see Configuration Changes) -> fixed in the skill.
|
||||
- API 429 rate limit during rapid testing -> added pauses between calls.
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
All in `.claude/skills/bitdefender/` (committed + auto-synced):
|
||||
- `scripts/gz_client.py` — 5 fixes: `createCustomGroup` uses `groupName` (not `name`); `assignPolicy` sends `inheritFromAbove:false` with `policyId` (dropped wrong inherit_from_above option); `createIsolateEndpointTask`/`createRestoreEndpointFromIsolationTask` use single `endpointId` (loop, not `endpointIds` array); `deletePackage` uses `packageId` (not packageName/companyId).
|
||||
- `scripts/gz.py` — `_print_kv` tolerates list results (install-links/endpoint-tags); CLI updated (`delete-package --id`, dropped `--inherit-from-above`).
|
||||
- `scripts/selftest.py` — updated for new arg shapes; 75/75.
|
||||
- `references/api-reference.md` — live-verified param shapes for the 5 corrected methods.
|
||||
- `.claude/memory/feedback_bitdefender_unattended_install.md`, `feedback_rmm_longops_fire_and_forget.md`, `reference_gravityzone_support.md` (earlier in the day).
|
||||
|
||||
## Credentials & Secrets
|
||||
|
||||
- No new credentials created. GravityZone API key remains in SOPS vault `msp-tools/gravityzone.sops.yaml` field `credentials.api_key` (HTTP Basic: key as username, empty password). It was used to download the full kit; scrubbed from RMM command history after staging.
|
||||
- GuruRMM server SSH/DB (vault `infrastructure/gururmm-server.sops.yaml`): host 172.16.3.30, ssh user `guru`, postgres user `gururmm`. Used the server's own root RMM agent (no SSH needed) for staging + DB redaction.
|
||||
|
||||
## Infrastructure & Servers
|
||||
|
||||
- GravityZone Cloud Public API: `https://cloud.gravityzone.bitdefender.com/api/v1.0/jsonrpc`. ACG company `5c428b246c031893678b4569`. Test endpoint GZ id `6a3849a7029f60770fa9d172`.
|
||||
- Full-kit download (auth): `.../api/v1.0/http/downloadPackageFullKit?packageId=<id>&downloadType=20` (x64). Anonymous downloader stub: `cloud.gravityzone.bitdefender.com/Packages/BSTWIN/0/setupdownloader_[base64].exe`. BEST uninstall tool: `https://www.bitdefender.com/files/business/Tools/BEST_uninstallTool.exe` (~296KB).
|
||||
- GuruRMM server 172.16.3.30: downloads dir `/var/www/gururmm/downloads` served at `http://172.16.3.30/downloads/`. gururmm linux agent id `9b92b187-98c7-41b0-9e97-1698d263c42d` (runs as root; sandbox blocks `sudo` via no-new-privileges).
|
||||
- RMM-TEST-MACHINE: Windows 11, RMM agent `99d6d692-99e0-4359-9f9c-f43be89f49e5`, Howard-VM site.
|
||||
|
||||
## Commands & Outputs
|
||||
|
||||
- Safe param discovery: `gz.py raw --module M --method m --params '{}' --confirm` -> "required parameter is missing : X".
|
||||
- Working install (offline, unattended): scheduled task running `epskit_x64.exe /bdparams /silent` as SYSTEM; BITS pull from `http://172.16.3.30/downloads/bdkit-test.zip`.
|
||||
- Working policy assign: `assignPolicy {policyId, targetIds:[ep], inheritFromAbove:false}` -> true. Inherit mode: `{targetIds, inheritFromAbove:true}`.
|
||||
- Working isolate/unisolate: `createIsolateEndpointTask {endpointId}` -> true; restore fails while isolate task in progress ("cannot be restored") -> wait + retry.
|
||||
- Managed uninstall: `gz.py delete-endpoint <gzId> --confirm` (creates uninstall-client task; agent self-removes; services gone, dir clears on reboot).
|
||||
- DB redaction: `PGPASSWORD=... psql -h 127.0.0.1 -U gururmm -d gururmm -c "UPDATE commands SET command_text='[REDACTED]' WHERE command_text LIKE '%downloadPackageFullKit%';"`.
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
|
||||
- Write methods NOT executed (verified by shape+gating only; unsafe/infeasible on live partner tenant): company-create/suspend/activate/delete, account-create/update/delete, notif-configure, quarantine-remove/restore, custom-rule-create/delete (needs full EDR rule body), incident-status/note (getIncidentsList dead on tenant), push-set/push-test (needs a webhook receiver endpoint).
|
||||
- Push webhook receiver (HTTPS, TLS1.2+) still not built — required before `push-set` can be enabled (coord-API or RMM ingest route).
|
||||
- RMM push-deploy of BD could be packaged into the skill/rmm as a reusable unattended deploy (the staged-kit + scheduled-task pattern proven here).
|
||||
- packetdial skill still needs its NetSapiens API key provisioned (deferred to Mike, earlier).
|
||||
|
||||
## Reference Information
|
||||
|
||||
- GravityZone docs (authoritative): https://www.bitdefender.com/business/support/en/77211-79436-welcome-to-gravityzone.html ; assignPolicy 77212-924802; createCompany 77211-126236; setPushEventSettings 77209-135319; deleting-endpoints 77209-88049.
|
||||
- Skill tracker: `.claude/skills/bitdefender/references/BUILDOUT.md` (all live modules complete). Spec: `references/api-reference.md`.
|
||||
- Key commits (claudetools): `603773c` assignPolicy/isolate fixes; delete_package fix + earlier groupName/_print_kv via auto-sync (be9d6c3 region). guru-rmm `58c1a96` (500-leak tail, earlier).
|
||||
- 5 doc-vs-live fixes: createCustomGroup(groupName), _print_kv(list), assignPolicy(inheritFromAbove:false), isolate/unisolate(endpointId), deletePackage(packageId).
|
||||
@@ -162,3 +162,76 @@ Key gotcha encoded in scripts: pfSense `display_errors=Off` (run php with `2>&1`
|
||||
- Prior sync commits this session: `5ede4fe` (earlier auto-sync), `96a5dd6` (the pfSense build).
|
||||
- pfSense filter-rule schema: keyed on `tracker`; `type` = pass/block/reject; `disabled` key
|
||||
presence = off; `source`/`destination` are objects (`{any:""}` or `{network,port}`).
|
||||
|
||||
---
|
||||
|
||||
## Update: 16:14 PT — finished §E build-out, conformance, prep, coord triage
|
||||
|
||||
Continued the same session well past the initial pfSense-control-verbs build. Net: the entire
|
||||
unifi-wifi pfSense gateway compatibility layer (ROADMAP §E) is now complete, validated, documented;
|
||||
the whole skill is brought into skill-module conformance; and the remaining blocked items are prepped.
|
||||
|
||||
### Session Summary (continuation)
|
||||
- **`--port` support** added to `pfsense-ssh.sh` (precedence: `--port` flag > vault `port`/`credentials.port`
|
||||
field > default 22). Caught + fixed a regression where the vault returns the literal string `"null"` for a
|
||||
missing field (so `:-` defaulting failed → `ssh -p null`); normalized `""`/`null`. Also fixed `run` to use
|
||||
`POS` not the removed `RAWARGS` so flags don't leak into ad-hoc commands.
|
||||
- **Site→gateway map**: `sites.sh` now prints a live gateway map (each UOS site classified UniFi-gw vs
|
||||
pfSense/third-party + vaulted pfSense creds with host:port). 12 UniFi-gw / 36 no-UniFi-gw sites; 1 cred (Cascades).
|
||||
- **Auto-select**: new `references/site-gateways.tsv` (site_id→cred map, seeded Cascades) + new
|
||||
`scripts/gateway-map.sh` (`lookup`/`list`/`validate`/`suggest`). `gw-audit`/`gw-control` now auto-route a
|
||||
site to its pfSense cred with NO `--pfsense` — validated `gw-control Cascades fw-list` + `gw-audit Cascades`.
|
||||
- **Mike's cred-path answer = option A** (coord msg 0bca380f): 1st arg with `/` = full vault path, else a
|
||||
client slug; fail loud `[ERROR] no cred at vault:<path>` on a bad path. Implemented in `pfsense-ssh.sh` +
|
||||
`gw-audit`/`gw-control` dispatch; validated path mode, slug mode, fail-loud. Unblocks the ACG office box
|
||||
(`--pfsense infrastructure/pfsense-firewall`). Closed the coord loop (todo `e0ba933f` done, reply `fc96afba`).
|
||||
- **Skill error-logging conformance** (skill-creator mandatory rule — the whole skill had none): added
|
||||
`logerr` (canonical `log-skill-error.sh`) to the live scripts (pfsense-ssh/gw-control/gw-audit) + SKILL.md
|
||||
guideline, then delegated the remaining ~20 scripts to a sub-agent (verified: all `bash -n` clean, exit
|
||||
codes preserved, conservative functional-only logging, no false positives). New `gateway-map.sh` self-logs.
|
||||
- **pf-* NAT verbs live-verified** on Cascades via a temp source-locked (TEST-NET) disabled port-forward:
|
||||
create→pf-list→set-ports→set-src→enable→disable→delete; `pf-delete` removed the associated filter rule;
|
||||
box back to baseline (nat=0, filter=21). Verb mechanics + documented NAT schema confirmed.
|
||||
- **§D least-privilege** prepped: `gw-audit` now prefers a read-only controller cred
|
||||
(`infrastructure/uos-server-network-api`) and auto-uses it once vaulted (falls back to RW with a hint).
|
||||
- **Prep docs**: new `references/onboarding.md` (copy-paste §B AP-cred + pfSense + §D RO-cred procedures);
|
||||
ROADMAP §C now carries a build-ready WireGuard VPN-server design.
|
||||
- **Coord inbox triage**: 41 messages, all already read; surfaced 4 possibly-open (BUG-016/017, log-analysis
|
||||
interview, LHM, billing), rest resolved/FYI.
|
||||
- **BUG-016/017 verification** (user asked to take them): both ALREADY FIXED in guru-rmm (commit 30da053,
|
||||
Mike 2026-06-01) — `StateDirectory=gururmm` in the systemd template (016) + `OnceLock CACHED_ID` in
|
||||
`device_id.rs` (017). Verified against working tree `ed8cad3` (== origin/main). Nothing to take.
|
||||
|
||||
### Key Decisions (continuation)
|
||||
- Auto-select keyed on the stable 24-hex UOS `site_id`, not a fuzzy name (lookup also accepts a name as a
|
||||
fallback). Map generated/consulted live; the TSV is the only persisted state.
|
||||
- Error-logging delegated to a sub-agent for the ~20-script sweep (high-volume, mechanical-with-judgment,
|
||||
independent) — but the 3 live entry-point scripts done by hand first as the template, and the agent's output
|
||||
verified (syntax + exit-code preservation + no false-positive logging) rather than trusted.
|
||||
- pf-* verification used a hand-built forward source-locked to TEST-NET (inert even if enabled) on the
|
||||
designated test box (Cascades) — reversible, no exposure; noted that a GUI-created forward is a future
|
||||
belt-and-suspenders schema check.
|
||||
|
||||
### Problems Encountered (continuation)
|
||||
- `ssh -p null` regression from the vault returning `"null"` for a missing `port` field → normalized.
|
||||
- Repeated `cd` into the guru-rmm submodule left the working dir there; a later `sync.sh` relative-path call
|
||||
failed (exit 127) → re-ran from `/c/claudetools`. (Recurring Bash-cwd-persistence friction.)
|
||||
|
||||
### Configuration Changes (continuation)
|
||||
Created: `scripts/gateway-map.sh`, `references/site-gateways.tsv`, `references/onboarding.md`.
|
||||
Modified: `scripts/pfsense-ssh.sh`, `scripts/gw-control.sh`, `scripts/gw-audit.sh`, `scripts/sites.sh`,
|
||||
`scripts/pfsense-gwc.php` (gwc_ prefixes earlier), the other ~20 `scripts/*.sh` (logerr), `SKILL.md`,
|
||||
`references/ROADMAP.md`; wiki `systems/pfsense.md`, `systems/uos-server.md`, `index.md`.
|
||||
|
||||
### Pending / Incomplete Tasks (continuation)
|
||||
- **§B / §D**: blocked only on external inputs (per-client AP creds + site reach; the read-only controller
|
||||
account). Procedures ready in `references/onboarding.md`; `gw-audit` already RO-cred-ready.
|
||||
- **§C**: VPN-server stand-up — build-ready design in ROADMAP §C; not externally blocked (Cascades reachable).
|
||||
- **BUG-018** (guru-rmm, P2, Open): `DELETE /api/agents/:id` resets connection (HTTP 000), FK-cascade slow;
|
||||
`server/src/api/agents.rs:127-150` → `server/src/db/agents.rs:186-190`. Offered to take; awaiting go.
|
||||
- Optional pf-* GUI-created-forward schema spot-check; optional `pf-add` verbs (not needed today).
|
||||
|
||||
### Reference Information (continuation)
|
||||
- Coord: Mike's answer `0bca380f`; reply `fc96afba`; todo done `e0ba933f`.
|
||||
- guru-rmm: BUG-016/017 fixed in commit `30da053`; BUG-018 open (FEATURE_ROADMAP.md ~line 411); repo HEAD `ed8cad3`.
|
||||
- Sync commits this continuation: `5ede4fe`→`be9d6c3` (multiple auto-syncs across the build).
|
||||
|
||||
Reference in New Issue
Block a user