Audit fix M3: _write_cache did a non-atomic CACHE_FILE.write_text and the
write-through helpers did unlocked read-modify-write, so a crash mid-write could
truncate inventory.json and two concurrent gz.py runs could lose an update.
- _write_cache now writes a temp file (fsync) then os.replace() - atomic on the
same filesystem; a reader/crash can never see a partial file, and a failed
write leaves the prior cache intact and no .tmp residue.
- Added a best-effort cross-platform advisory lock (_cache_lock) around the
read-modify-write in _cache_add_group/_cache_add_package; steals a stale lock,
proceeds unlocked on timeout (a lost update is tolerable, a hang is not).
- Dropped the dead cache.setdefault('companies', ...) line in _cache_add_group.
- Verified: compile clean; unit tests for round-trip, lock acquire/release/steal,
write-through, temp cleanup on failure, and prior-cache survival.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Audit fix H2 (+ M2): the live GravityZone tenant is rate-limited and sweeps fan
out one getManagedEndpointDetails per endpoint across every company, which hit a
real HTTP 429 (errorlog 2026-06-21). _post had zero retry and opened a fresh
httpx.Client (new TLS handshake) per request.
- _post now retries 429/500/502/503/504/timeout up to RETRY_MAX_ATTEMPTS with
bounded exponential backoff + jitter, honoring Retry-After (numeric or HTTP-date).
Retry notices go to stderr (don't pollute --json). Terminal errors still raise.
- M2: a single httpx.Client is created lazily and reused (connection pooling),
closed via client.close() in main()'s finally. Makes the docstring's pooling
claim true and cuts handshake overhead + 429 pressure during sweeps.
- Verified: compile clean; offline unit tests (persistent 429 -> 4 attempts then
raise, flaky 503 -> recovers, Retry-After honored); live status read OK.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Audit cluster C1/C2/H1/H3/M1 on the live GravityZone tenant:
- C1/H1/M1: move, scan, create-package, make-group called the live API with
no --confirm; added _gated() + a --confirm flag to each (move can change an
endpoint's inherited policy posture).
- C2: extend raw's destructive-method denylist with moveEndpoints/moveCustomGroup/
createScanTask/createPackage/createCustomGroup so 'raw' can't bypass the gates.
- H3: add _require_oid() 24-char-hex validation to endpoint/policy/endpoints +
the gated handlers, so malformed ids no longer hit the tenant or get mislogged
as functional errors (source of the 2026-06-21 errorlog noise).
- Gate refusals now print to stderr (don't pollute --json). SKILL.md gating list
updated. Verified: compile clean; gates exit 3, bad ids exit 2, raw denylist hits.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New /datto-edr skill — standalone CLI for the Datto EDR REST API (azcomp4587,
rebranded Infocyte HUNT, raw-token LoopBack). Live-verified reads across the whole
MSP fleet: orgs/sites/agents/detections/sweep + deploy-cmd. Scan/isolate gated
behind --confirm (shape-verified, not run against prod). Token vaulted at
msp-tools/datto-edr.sops.yaml.
Also: reference_syncro_rmm_api_gui_only memory (Syncro RMM policy mgmt is
GUI-only) and the guru-rmm submodule pointer bump (Feature 6 EDR research).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Code is merged + deployed but the feature is beta and unreliable —
best-effort only, not guaranteed on protected AV, WiX bundles, UI-only
or lingering-child uninstallers, or drivers. Reframe 'SHIPPED' framing
across Summary, Capabilities, Active State, History, and the index.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addresses 5 verified findings from /code-review high:
- SynoError carries DSM code + handled flag; call() no longer logs eagerly.
Top-level handler logs only genuine unhandled failures, so the handled
FileStation denial + VPN-down connect errors stop polluting errorlog.md
(was a CLAUDE.md rule violation: don't log handled conditions).
- FileStation-denial detection is numeric (code in 400/407), not substring.
- SSH hint now also fires on the generic `call` path, not just `ls`.
- `services` falls back get->list on 103 for older DSM builds (multi-device).
- BrokenPipe flush moved inside try so small piped output can't leak a traceback.
- Trim SKILL description 755->515 chars (was the longest of 32 skills; self-check
registry-budget WARN).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Fix argparse: --confirm/--vault were only accepted BEFORE the subcommand, so
every documented gated-write (e.g. `call X set k=v --confirm`) failed. Moved to
a shared parent parser (SUPPRESS defaults) -> both flags work in either position.
- Verified the CSRF write path live on cascadesDS: Share create -> verify ->
delete -> verify gone. Both mutating calls succeeded; device left pristine.
- SKILL.md: write/setter path marked VERIFIED; confirmed share-create signature.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Exercised every Web API + SSH read command live against cascadesDS.
- All reads OK; `ls <folder>` (FileStation list) is 407-denied for the admin
account on this box (confirmed on-box as SYSTEM_ADMIN) -> now catches the
400/407 and prints an SSH file-browse hint. `ls` share-roots still works.
- SSH backend (info/df/run + privileged synowebapi) verified.
- Documented MSYS path-mangling of bare `ls /path` arg on Windows.
- SKILL.md: per-command results; flagged write/setter path as not-yet-live.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First live exercise of the skill against the Cascades DS718+ (DSM 7.2.1, VPN up).
- Fix `services`: SYNO.Core.Service.list 103s on DSM 7.2.1 -> method is `get`.
- Fix `apis | head` BrokenPipeError traceback -> caught, clean exit.
- SKILL.md status PLUMBED -> VERIFIED 2026-06-25 with live device facts.
- wiki/cascades-tucson: add NAS specs, resolve model/RAM/DSM TODO.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the thin "Future: GuruRMM integration" stub with a "Why this skill exists"
section: ScreenConnect surfaces as a per-partner Integrations Center / addons-page
entry, positioned as the bring-your-own alternative to GuruConnect (a partner already
paying for ScreenConnect uses their licensed instance as the remote-access backend).
Points at the mapped plan: SPEC-024, RMM_THOUGHTS Feature 7 + Refinement 7a, the
Integrations Center roadmap item.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Migrate Howard's AMPIPIT toolkit into the fleet as a private Gitea
submodule (azcomputerguru/ampipit), matching the guru-rmm pattern.
Full history (49 commits + tags) pushed to Gitea and verified before
integration. Tracks main.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>