sync: auto-sync from GURU-5070 at 2026-07-04 07:30:32
Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-07-04 07:30:32
This commit is contained in:
@@ -43,6 +43,8 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure ·
|
||||
|
||||
2026-07-04 | Howard-Home | screenconnect | ScreenConnect API error [SendCommandToSession]: HTTP 500: {"errorType":"","message":"An session manager fault error occurred while processing your request. Please contact support if the problem persists."} [ctx: cmd=send-command]
|
||||
|
||||
2026-07-04 | GURU-5070 | ps-encoded | encode produced empty output [ctx: src=/dev/fd/63]
|
||||
|
||||
2026-07-03 | GURU-5070 | agy/gemini-cli | old gemini npm CLI dead on this account: throwIneligibleOrProjectIdError (needs GOOGLE_CLOUD_PROJECT); replaced by Antigravity 'agy' binary [ctx: fix=rewired-to-agy]
|
||||
|
||||
2026-07-03 | GURU-5070 | grok | grok returned no text [ctx: mode=text stopReason=Cancelled]
|
||||
|
||||
Submodule projects/msp-tools/security-assessment updated: 0f6927b635...1a582e4afa
@@ -0,0 +1,170 @@
|
||||
## User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** GURU-5070
|
||||
- **Role:** admin
|
||||
|
||||
## Session Summary
|
||||
|
||||
Continued and completed the GuruRMM VSS policy-configurator redesign (SPEC-016 / spec
|
||||
`vss-policy-config`), building Tasks 2 through 7 on branch `feat/vss-native-com`, running a
|
||||
workflow-backed code review, fixing every finding, merging to main, and confirming the deploy on
|
||||
the live T1490 canary (NEPTUNE). Work order: Task 2 (the `vss-create` verb + create-only pass +
|
||||
scheduled-task registration with both trigger modes + legacy-task migration), Task 3 (set the
|
||||
per-volume size cap governor on policy apply), Task 4 (retire the scheduled create+prune pass and
|
||||
`prune` — the T1490 surface — reworking the SPEC-025 compliance heal to create-only via a shared
|
||||
`create_one_volume`), Task 5 (status/compliance detail reflects the configurator model), Task 6
|
||||
(both-target release build), Task 7 (runtime verification). Each task was `cargo check`ed on the
|
||||
Pluto build host (stable + legacy) and pushed incrementally.
|
||||
|
||||
Runtime-verified the configurator end to end on a Windows Server 2019 box (Pluto, no Falcon) and a
|
||||
Win11 Pro client (GURU-5070, Falcon present) using a new hidden `vss-apply-test <policy.json>` verb
|
||||
that drives the real `apply_policy` path. Confirmed: `GuruRMM-VSS-Create` task registered (daily-times
|
||||
AND every-N-hours interval triggers both work), legacy `GuruRMM-VSS-Snapshot` removed, size cap set,
|
||||
native COM create yields a Persistent/Client-accessible shadow on both SKUs including the Falcon client.
|
||||
|
||||
Ran a high-effort workflow-backed code review (23 agents, 19 verified findings) BEFORE merge. It
|
||||
caught a genuine data-loss bug: the original design set the machine-global `MaxShadowCopies` registry
|
||||
value to `retention_count`, which FIFO-evicts OTHER VSS consumers' (System Restore, third-party
|
||||
backup) shadows below that count. Fixed that (dropped MaxShadowCopies management entirely; size cap
|
||||
is the sole governor) plus nine other findings (cap-before-create for newly-eligible volumes, a
|
||||
`vss-snapshot` alias to bridge the upgrade window, interval-aware compliance window, bounded
|
||||
IVssAsync::Wait+Cancel instead of INFINITE, provision error unmasking, ExecutionTimeLimit 1h->2h,
|
||||
filesystem task-presence check instead of a per-eval PowerShell spawn, a warning on the now-ignored
|
||||
retention_max_age_days). Rebuilt + re-verified on Pluto; both fixes (data-loss + upgrade-gap)
|
||||
confirmed at runtime.
|
||||
|
||||
Merged `feat/vss-native-com` -> main (merge commit `de30b2b`), which fired the webhook build
|
||||
pipeline: version auto-bumped 0.6.75 -> 0.6.76, signed Windows MSI built on Beast, published. KEY
|
||||
CORRECTION discovered post-merge: the CI publishes to the **beta** channel, not stable. Stable
|
||||
channel is 0.6.66 (233 agents — NOT stuck, by design); beta head is now 0.6.76 (8 agents auto-updated).
|
||||
So the merge is a beta canary deploy, not a fleet-wide push; reaching stable is a separate deliberate
|
||||
promotion. Set NEPTUNE (stable, 0.6.66) to the beta channel via `PATCH /api/agents/:id/channel` so its
|
||||
native binary-swap updater pulled 0.6.76 (updated in 90s, enrollment preserved). Confirmed the
|
||||
configurator on NEPTUNE in production: create task present, legacy removed, MaxShadowCopies UNSET (its
|
||||
17 existing shadows preserved — the data-loss fix proven in prod), cap 15%, and a live `vss-create`
|
||||
created shadows on C: and F: under Falcon with no T1490.
|
||||
|
||||
Final thread: while attempting a full beta-cohort health sweep, `sops.exe` on GURU-5070 became
|
||||
execution-blocked mid-session by **WDAC / Windows Application Control** (enforced;
|
||||
`CodeIntegrityPolicyEnforcementStatus=2`), error "An Application Control policy has blocked this file."
|
||||
This is NOT Falcon (user removed Falcon; block persisted). It kills all vault decryption -> RMM API
|
||||
auth on this box. Could not complete the beta sweep (DB-direct route blocked by root-only DATABASE_URL,
|
||||
no passwordless sudo). Beta head (0.6.76) looks healthy from the observable sample; whether the ~65
|
||||
beta agents still on 0.6.75 converge to 0.6.76 is unconfirmed.
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- **Do NOT write machine-global `MaxShadowCopies`** (code-review #1, Mike chose "drop it"). It is a
|
||||
per-volume cap shared by ALL VSS consumers; lowering it to retention_count evicts other products'
|
||||
recovery points. Size cap (per-volume diff-area) is the sole governor now; retention_count is advisory.
|
||||
- **`retention_max_age_days` -> warn when set** (Mike). Age pruning required deletion (T1490); it is
|
||||
unsupported post-pivot, so the agent logs a WARNING rather than silently ignoring it.
|
||||
- **Keep a hidden `vss-snapshot` alias -> create pass.** The legacy scheduled task (renamed subcommand
|
||||
after upgrade) would otherwise error until a policy-hash change; the alias bridges the upgrade window.
|
||||
- **create_one_volume provisions the size cap BEFORE each create** (strict: skip create if cap fails).
|
||||
Restores the old "never unbounded" invariant for a volume that becomes eligible after policy apply.
|
||||
- **Bounded IVssAsync::Wait + Cancel** (was INFINITE) so a wedged DoSnapshotSet can't poison the single
|
||||
shared COM worker thread for the process lifetime; the cancel also prevents an orphaned shadow.
|
||||
- **Merge = BETA deploy, not fleet.** The webhook build publishes to the beta channel; stable (0.6.66,
|
||||
233 agents) is untouched. Reaching the fleet is a separate promote-to-stable step (deliberate, Mike-gated).
|
||||
- **Update NEPTUNE via channel flip, not manual msiexec.** The agent updater is a server-driven binary
|
||||
swap (download .exe -> verify SHA256 -> sc stop/replace/start), enrollment-preserving. Setting
|
||||
update_channel=beta lets that native path run rather than a risky hand-rolled MSI install.
|
||||
- **policy_hash change forces reconcile on upgrade.** Adding `schedule_interval_hours` to policy_hash
|
||||
means the old stored hash never matches the new agent -> forced reconcile on first apply, which
|
||||
registered the create task on NEPTUNE despite its unchanged policy (closes the code-review #3 residual).
|
||||
|
||||
## Problems Encountered
|
||||
|
||||
- **snapshot_one_volume shared with the SPEC-025 heal.** Removing prune required reworking the heal;
|
||||
extracted a create-only `create_one_volume` used by both the scheduled pass and the heal.
|
||||
- **Dead code after removals** (firstrun staggering, volume_jitter_minutes, DEFAULT_RETENTION_COUNT,
|
||||
FIRSTRUN_PREFIX). Removed; both variants build warning-free for vss.rs.
|
||||
- **Pluto SSH shell is cmd.exe**, not bash — `tail`/quoting broke remote one-liners. Fixed by driving
|
||||
via `powershell -NoProfile -Command` and filtering locally.
|
||||
- **First release build lacked `vss-apply-test`** — I rebuilt without re-syncing Pluto to the commit
|
||||
that added the verb. Fixed by git reset --hard origin/feat/vss-native-com before rebuild.
|
||||
- **Client test cap-restore regex didn't match** GURU-5070's shadowstorage format, leaving C: cap at
|
||||
10% instead of prior 1%. Restored manually to 10 GB.
|
||||
- **Deploy channel misread.** Initially reported merge as fleet-wide; it is beta-only. 233 agents on
|
||||
0.6.66 are the STABLE fleet on current stable, NOT a stuck cohort (0.6.66.msi.channel=stable,
|
||||
0.6.67+ = beta). Corrected.
|
||||
- **`sops.exe` blocked by WDAC mid-session** (not Falcon). Blocks vault/RMM-API auth on GURU-5070.
|
||||
Unresolved at session end; beta cohort sweep left for Mike to run via the server DB (or after WDAC fix).
|
||||
- **errorlog.md write also failed** ("could not write D:/claudetools/errorlog.md") — a second permission
|
||||
symptom on this box, noted.
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
Repo (guru-rmm submodule, `feat/vss-native-com` -> merged to main as `de30b2b`, then CI bumped to `9de44d3`):
|
||||
- `agent/src/vss.rs` — create_pass/create_one_volume (create-only + cap-before-create), register_scheduled_task
|
||||
(GuruRMM-VSS-Create, dual trigger modes, legacy removal), ensure_governors (size cap only; age warn),
|
||||
removed prune/run_snapshot_pass/snapshot_one_volume + dead helpers, compliance_window_hours, evaluate_compliance
|
||||
detail (size-cap-fifo model), create_task_present (filesystem stat), read_max_shadow_copies (status only).
|
||||
- `agent/src/vss_com.rs` — bounded Wait+Cancel in create_blocking; provision_blocking error unmasking.
|
||||
- `agent/src/main.rs` — VssCreate verb, VssSnapshot alias, VssApplyTest hidden verb.
|
||||
- `agent/src/transport/mod.rs` — schedule_interval_hours field.
|
||||
- `specs/vss-policy-config/plan.md` — Tasks 1-7 + code-review sections marked DONE.
|
||||
- CI auto-bumped `agent/Cargo.toml` 0.6.75 -> 0.6.76.
|
||||
|
||||
RMM state:
|
||||
- NEPTUNE (`b3a9b454-...`) update_channel set to `beta`; updated 0.6.66 -> 0.6.76; VSS reconciled onto
|
||||
GuruRMM-VSS-Create (12:00/18:00), legacy task removed, cap 15%, MaxShadowCopies untouched.
|
||||
|
||||
Repo (ClaudeTools main): this session log; submodule pointer -> guru-rmm `9de44d3`.
|
||||
|
||||
## Credentials & Secrets
|
||||
|
||||
- No new secrets created. Gitea push creds via vault `services/gitea.sops.yaml` (used before the sops block).
|
||||
- RMM admin API creds: vault `infrastructure/gururmm-server.sops.yaml` credentials.gururmm-api.admin-email /
|
||||
admin-password (consumed by rmm-auth.sh; BLOCKED now by the WDAC sops issue).
|
||||
- RMM server DB: real DATABASE_URL is in root-only `/opt/gururmm/.env` (`postgres://gururmm:<pass>@localhost:5432/gururmm`);
|
||||
guru-readable copies are templates with non-working passwords.
|
||||
|
||||
## Infrastructure & Servers
|
||||
|
||||
- **NEPTUNE** — 172.16.3.11, Win Server 2022, RMM agent `b3a9b454-86eb-491c-ac67-c1f98987d8dc`, Falcon
|
||||
present, now on agent 0.6.76 / beta channel. VSS: C: (cap 279GB/15%, 18 shadows), F: also eligible.
|
||||
- **Pluto** (build host) — Administrator@172.16.3.36, Windows Server 2019 Standard (ProductType=3, no Falcon),
|
||||
C:\gururmm checkout, MSVC + Rust stable + 1.77 legacy. `cargo build` from C:\gururmm\agent (no workspace root).
|
||||
- **GURU-5070** (this box) — Win11 Pro, RMM agent `819df0c8-...`, Falcon WAS present (Mike removed it this
|
||||
session), agent on 0.6.76/beta. WDAC/Application Control ENFORCED (blocks sops.exe).
|
||||
- **RMM server** — guru@172.16.3.30 (Ubuntu). Webhook build pipeline at /opt/gururmm (webhook-handler.py on
|
||||
:9000, build-windows.sh on Beast primary / Pluto fallback). Downloads at /var/www/gururmm/downloads +
|
||||
https://rmm.azcomputerguru.com/downloads. RMM API at http://172.16.3.30:3001. Postgres localhost:5432 db=gururmm.
|
||||
- **GURU-BEAST-ROG** ("Beast") — primary Windows build host for the pipeline.
|
||||
- Internal Gitea — http://172.16.3.20:3000/azcomputerguru/gururmm.git.
|
||||
|
||||
## Commands & Outputs
|
||||
|
||||
- Build channel truth: `gururmm-agent-base-0.6.66.msi.channel = stable`; 0.6.67+ = `beta`. Stable fleet = 0.6.66 (233).
|
||||
- NEPTUNE channel flip: `curl -X PATCH $RMM/api/agents/<id>/channel -d '{"channel":"beta"}'` -> HTTP 204.
|
||||
- NEPTUNE update: 0.6.66 -> 0.6.76 in ~90s (native binary-swap updater).
|
||||
- NEPTUNE live create: `vss-create pass complete (2/2 volume(s))` — shadows on C: {50171974...} + F: {A324F987...},
|
||||
Persistent/Client-accessible/No-writers/Differential; C: 17 -> 18 shadows; MaxShadowCopies UNSET.
|
||||
- sops block: `& sops.exe --version` -> "An Application Control policy has blocked this file";
|
||||
Win32_DeviceGuard CodeIntegrityPolicyEnforcementStatus=2 / UsermodeCodeIntegrityPolicyEnforcementStatus=2.
|
||||
- Beta sweep query for Mike (run on .30):
|
||||
`sudo bash -c 'set -a; . /opt/gururmm/.env; psql "$DATABASE_URL"' <<'SQL' ... where update_channel='beta' group by 1 ... SQL`
|
||||
|
||||
## Pending / Incomplete Tasks
|
||||
|
||||
- **WDAC/sops block on GURU-5070** — unresolved. Blocks all vault decryption + RMM API auth + gitea pushes
|
||||
from this box. Fix: allowlist sops.exe in the WDAC/Smart App Control policy, or disable SAC (one-way on Win11).
|
||||
Not Falcon (removed; block persisted).
|
||||
- **Beta cohort convergence unconfirmed** — verify whether ~65 beta agents on 0.6.75 advance to 0.6.76 or stall.
|
||||
Run the psql query above (or resume once sops works).
|
||||
- **Promote 0.6.76 -> stable** when beta soak is satisfactory (deliberate step; mechanism TBD — likely re-tag
|
||||
the .msi.channel or a stable pointer; no agent promote script found, only promote-dashboard.sh).
|
||||
- **NEPTUNE channel** — left on beta (canary). Revert to stable if desired (stays 0.6.76 either way; no downgrades).
|
||||
- Leftover test shadows: {8FEFDAE3} NEPTUNE, {79676B16} GURU-5070, {50171974} NEPTUNE — all FIFO-evict via cap.
|
||||
- Parent ClaudeTools submodule pointer -> 9de44d3 (folds into this sync).
|
||||
|
||||
## Reference Information
|
||||
|
||||
- guru-rmm branch merged: `feat/vss-native-com` -> main `de30b2b` (merge), CI bump `9de44d3`. Agent version 0.6.76.
|
||||
- Key commits: Task2 0cdcff5, Task3 4f78513, Task4 8f59706/feeb168, Task5 74e7d6b, review fixes d73c086/691dd62.
|
||||
- Code review workflow output: `C:\Users\guru\AppData\Local\Temp\claude\...\tasks\wyosoguyd.output` (10 findings).
|
||||
- RMM channel API: `PATCH /api/agents/:id/channel {"channel":"stable"|"beta"}` (server api/mod.rs:306).
|
||||
- Spec: `projects/msp-tools/guru-rmm/specs/vss-policy-config/{plan,shape,references,standards}.md`.
|
||||
- Published artifact: `gururmm-agent-base-0.6.76.msi` (sha256 f9eee26d6ee61acaee69747d945cbeca0f448120a1013845f0c553d48ac55f1d), channel beta.
|
||||
Reference in New Issue
Block a user