- deploy-cmd: require explicit --regkey or --group; never auto-pick an
arbitrary cross-client registration key (would enroll into wrong org).
- raw: block POST to any */scan endpoint with no non-empty `where`
(same tenant-wide footgun the scan command guards against).
- main(): catch-all for unexpected exceptions -> [ERROR] + errorlog,
plus clean KeyboardInterrupt (130).
- isolate: forgiving extension-name match (exact, then substring),
excludes the paired "Restore" ext; errors on ambiguous match.
- detections: --site -> --target-group; Alert.targetGroupId is a
scan-target id, not a Location id (distinct from `agents --site`).
- status: relabel "Target groups (sites)" -> "Scan target groups".
- SKILL.md + docstrings updated to match.
Verified: py_compile clean, selftest green (216 agents), guards fire
on no-key/empty-where/no-agent, deploy-cmd --group picks the group's key.
Convergence-pass LOW/NIT cleanup:
- cmd_companies uses list_all_companies() so a >100-company tenant isn't truncated
in the listing (was page-1 only); matches sweep/inventory.
- removed unused 'field' import from dataclasses.
Deliberately NOT changed: id validation on delete-package/report-delete/blocklist-
remove/quarantine-remove/restore - those ids are not pinned 24-hex format, so
validating could reject valid input; they are --confirm-gated and bad ids match
the expected-error markers (no mislog). 81/81 selftest.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
From a third review pass (converging - all MEDIUM/LOW):
- urllib fallback: a post-send reset (RemoteDisconnected/ConnectionReset, which
urllib wraps in URLError) was misclassified as always-safe 'connect' and could
retry a non-idempotent write after a server commit. Now only ConnectionRefused/
DNS (socket.gaierror) -> 'connect'; everything else -> 'timeout' (write-gated).
- _retry_delay clamps a negative numeric Retry-After to 0 (was -> time.sleep(-1) ValueError).
- cmd_sweep + cmd_install_links now validate --company; cmd_company_create validates
--parent (finished _require_oid consistency - these mislogged as errorlog noise).
- cmd_push_test parses --extra-json before gating (validate->gate order, matches siblings).
- selftest: +sweep/install-links bad-company assertions. 81/81. Units: clamp + reset classification.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remaining LOW/NIT items from the second review pass:
- list_all_companies() paginates the company list; sweep-all + refresh_inventory
no longer truncate a >100-company tenant.
- Pre-send connection failures (httpx ConnectError/ConnectTimeout; urllib URLError
not wrapping a timeout) are now retried as 'connect' - always safe (no side
effect) even for non-idempotent writes; ambiguous read-timeouts stay idempotent-gated.
- Explicit Retry-After honored up to RETRY_AFTER_MAX_SECONDS (120s) instead of the
30s exponential cap, so a server-mandated cooldown isn't cut short.
- GravityZoneClient is now a context manager (__enter__/__exit__ -> close()).
- incident-status/note reject an empty --set-json (rc2), matching account-update/notif.
- selftest: +connect/Retry-After/ctx-mgr unit coverage, incident empty-json assertion. 79/79.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Batched the audit doc/LOW findings plus the two pagination LOWs:
- Pagination (gz_client): security_sweep and refresh_inventory stopped on a
'total' field some responses omit, truncating after page 1. Now page until a
short page (< per_page) - the reliable last-page signal.
- isolate/restore docstrings (gz_client): removed the stale 'v1.2 takes an ARRAY
endpointIds' lines that contradicted the verified single-endpointId code.
- Cache 'no PII' wording corrected (gz_client header + SKILL.md): cache holds infra
identifiers (hostnames/FQDNs); no secrets. Dead _require_company_for_sweep removed.
- Doc drift fixed: delete-package is '--id <packageId>' not '--package <name>'
(SKILL.md + api-reference.md, verified live); module docstring + sweep --company
help corrected (sweep with no --company fans out to ALL companies).
- selftest aligned to the improved behavior: malformed ids now exit rc2 client-side
(H3) instead of rc1; gate-refusal 'Would' messages now assert on stderr (they
moved off stdout so --json isn't polluted). 75/75 pass; live sweep verified.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Found during a full command-surface recheck: every privileged SSH recipe
(shares/users/groups/acl) was broken — sudo secure_path drops /usr/syno/{bin,sbin}
so synoshare/synouser/synogroup/synoacltool were "command not found" (non-sudo
plain recipes worked because the admin login PATH has them).
- Inject SYNO_PATH into priv()/plain(); run priv via `sh -c` so operators work.
- synouser/synogroup use `--enum local` (not the invalid `--list`).
- acl quotes the share path (handles spaces, e.g. "Sandra Fish").
- services repointed to Web API (no synoservice on DSM 7.2; synosystemctl has no list-all).
Verified live: all Web API reads, all SSH reads (acl returns real Windows ACEs),
write path (share create/delete), and every destructive command correctly gated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Audit fix M3: _write_cache did a non-atomic CACHE_FILE.write_text and the
write-through helpers did unlocked read-modify-write, so a crash mid-write could
truncate inventory.json and two concurrent gz.py runs could lose an update.
- _write_cache now writes a temp file (fsync) then os.replace() - atomic on the
same filesystem; a reader/crash can never see a partial file, and a failed
write leaves the prior cache intact and no .tmp residue.
- Added a best-effort cross-platform advisory lock (_cache_lock) around the
read-modify-write in _cache_add_group/_cache_add_package; steals a stale lock,
proceeds unlocked on timeout (a lost update is tolerable, a hang is not).
- Dropped the dead cache.setdefault('companies', ...) line in _cache_add_group.
- Verified: compile clean; unit tests for round-trip, lock acquire/release/steal,
write-through, temp cleanup on failure, and prior-cache survival.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Audit fix H2 (+ M2): the live GravityZone tenant is rate-limited and sweeps fan
out one getManagedEndpointDetails per endpoint across every company, which hit a
real HTTP 429 (errorlog 2026-06-21). _post had zero retry and opened a fresh
httpx.Client (new TLS handshake) per request.
- _post now retries 429/500/502/503/504/timeout up to RETRY_MAX_ATTEMPTS with
bounded exponential backoff + jitter, honoring Retry-After (numeric or HTTP-date).
Retry notices go to stderr (don't pollute --json). Terminal errors still raise.
- M2: a single httpx.Client is created lazily and reused (connection pooling),
closed via client.close() in main()'s finally. Makes the docstring's pooling
claim true and cuts handshake overhead + 429 pressure during sweeps.
- Verified: compile clean; offline unit tests (persistent 429 -> 4 attempts then
raise, flaky 503 -> recovers, Retry-After honored); live status read OK.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Audit cluster C1/C2/H1/H3/M1 on the live GravityZone tenant:
- C1/H1/M1: move, scan, create-package, make-group called the live API with
no --confirm; added _gated() + a --confirm flag to each (move can change an
endpoint's inherited policy posture).
- C2: extend raw's destructive-method denylist with moveEndpoints/moveCustomGroup/
createScanTask/createPackage/createCustomGroup so 'raw' can't bypass the gates.
- H3: add _require_oid() 24-char-hex validation to endpoint/policy/endpoints +
the gated handlers, so malformed ids no longer hit the tenant or get mislogged
as functional errors (source of the 2026-06-21 errorlog noise).
- Gate refusals now print to stderr (don't pollute --json). SKILL.md gating list
updated. Verified: compile clean; gates exit 3, bad ids exit 2, raw denylist hits.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New /datto-edr skill — standalone CLI for the Datto EDR REST API (azcomp4587,
rebranded Infocyte HUNT, raw-token LoopBack). Live-verified reads across the whole
MSP fleet: orgs/sites/agents/detections/sweep + deploy-cmd. Scan/isolate gated
behind --confirm (shape-verified, not run against prod). Token vaulted at
msp-tools/datto-edr.sops.yaml.
Also: reference_syncro_rmm_api_gui_only memory (Syncro RMM policy mgmt is
GUI-only) and the guru-rmm submodule pointer bump (Feature 6 EDR research).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addresses 5 verified findings from /code-review high:
- SynoError carries DSM code + handled flag; call() no longer logs eagerly.
Top-level handler logs only genuine unhandled failures, so the handled
FileStation denial + VPN-down connect errors stop polluting errorlog.md
(was a CLAUDE.md rule violation: don't log handled conditions).
- FileStation-denial detection is numeric (code in 400/407), not substring.
- SSH hint now also fires on the generic `call` path, not just `ls`.
- `services` falls back get->list on 103 for older DSM builds (multi-device).
- BrokenPipe flush moved inside try so small piped output can't leak a traceback.
- Trim SKILL description 755->515 chars (was the longest of 32 skills; self-check
registry-budget WARN).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Fix argparse: --confirm/--vault were only accepted BEFORE the subcommand, so
every documented gated-write (e.g. `call X set k=v --confirm`) failed. Moved to
a shared parent parser (SUPPRESS defaults) -> both flags work in either position.
- Verified the CSRF write path live on cascadesDS: Share create -> verify ->
delete -> verify gone. Both mutating calls succeeded; device left pristine.
- SKILL.md: write/setter path marked VERIFIED; confirmed share-create signature.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Exercised every Web API + SSH read command live against cascadesDS.
- All reads OK; `ls <folder>` (FileStation list) is 407-denied for the admin
account on this box (confirmed on-box as SYSTEM_ADMIN) -> now catches the
400/407 and prints an SSH file-browse hint. `ls` share-roots still works.
- SSH backend (info/df/run + privileged synowebapi) verified.
- Documented MSYS path-mangling of bare `ls /path` arg on Windows.
- SKILL.md: per-command results; flagged write/setter path as not-yet-live.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First live exercise of the skill against the Cascades DS718+ (DSM 7.2.1, VPN up).
- Fix `services`: SYNO.Core.Service.list 103s on DSM 7.2.1 -> method is `get`.
- Fix `apis | head` BrokenPipeError traceback -> caught, clean exit.
- SKILL.md status PLUMBED -> VERIFIED 2026-06-25 with live device facts.
- wiki/cascades-tucson: add NAS specs, resolve model/RAM/DSM TODO.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the thin "Future: GuruRMM integration" stub with a "Why this skill exists"
section: ScreenConnect surfaces as a per-partner Integrations Center / addons-page
entry, positioned as the bring-your-own alternative to GuruConnect (a partner already
paying for ScreenConnect uses their licensed instance as the remote-access backend).
Points at the mapped plan: SPEC-024, RMM_THOUGHTS Feature 7 + Refinement 7a, the
Integrations Center roadmap item.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase-3 post-rebase reconcile ran 'git submodule update --init --recursive'
unconditionally, force-detaching every submodule to the parent's pinned gitlink
and discarding any feature branch, commits, or uncommitted edits inside it. The
Phase-1a init guard did not cover this path. New submodule_update_safe() advances
ONLY submodules in the pristine pinned state (clean, detached HEAD) and skips any
on a branch or with uncommitted changes, so in-progress submodule work survives a
parent sync.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New skill to manage ACG's Yealink phone fleets via Yealink Management Cloud Service v2
(us-api.ymcs.yealink.com). RTFM'd the API (token auth via POST /v2/token Basic+bearer, NOT the
legacy RPS HMAC; legacy-TLS renegotiation required) + endpoint map from the dszp/n8n-nodes-
yealinkymcs community node. Live-verified: token auth, sites (one ACG AccessKey sees ALL client
sites — VWP/GuruHQ/Ace Pick Up Parks as children of the ACG parent), devices, accounts,
rps-servers (RPS = "WL - ACG" ftp://p.packetdials.net). Gated writes (--confirm): add-devices-by-mac,
add-sipaccount (push a NetSapiens SIP cred onto a phone = the PBX glue), reboot, reset, rps add/del;
+ raw passthrough (auto-recovers the MSYS /v2 path-mangling). Creds vaulted at
services/yealink-ymcs.sops.yaml. Pairs with packetdial onboard-domain for new-client phone
provisioning; VWP is the live pilot. Honest [V]/[P] verification status in SKILL.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
onboard-domain runs POST /domains -> addresses/validate (gen E911 pidflo) -> addresses/create
from one JSON body (domain fields + optional `emergency` block), gated --confirm. Reverse-
engineered from the OITVOIP wizard screenshots; live-created the real client domain
vwp.91912.service (Valley Wide Plastering) + E911 address, and proved the wrapper with a
throwaway create->delete (no leftovers, vwp intact). Documented GUI->API mapping + the two
manual gaps (voicemail user-defaults, email-send-from-address pending the packetdial.com mailbox)
+ the domain-type "no"-on-create quirk.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Added read wrappers: addresses (E911), smsnumbers, blocked-numbers, moh, dialrules,
recording, transcriptions. Added gated write wrappers: DID update/delete, per-user device
CRUD, E911 address CRUD, contact CRUD, site create/update, auto-attendant create, SMS
number CRUD, block/unblock numbers, MoH TTS create/delete.
Verification: contact create→delete lifecycle verified on arizonacomputerguru (id field is
`unique-id`); reads for addresses/blocked-numbers/moh verified. Remaining writes are plumbed
per the OpenAPI spec [P] but not lifecycle-verified (test domain lacks the feature or needs a
special body) — SKILL.md marks each [V]/[P] and documents the gotchas (E911 pidflo via
addresses/validate; SMS not provisioned on test domain; number-filters add 202'd but didn't
persist; MoH file upload is multipart -> raw). Capability map + api.md history updated.
All writes --confirm-gated; anything unwrapped still reachable via `raw`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Added write wrappers, each tested create→update→delete on the arizonacomputerguru test
domain (sanctioned, non-production):
- call queues: create-callqueue, update-callqueue, delete-callqueue + add-agent /
update-agent / remove-agent
- time frames: create-timeframe, update-timeframe, delete-timeframe (body-discriminated —
same path, server selects the op from the body; wrappers pass --body verbatim)
All behind --confirm (gate verified: DRY RUN refuses without it). SKILL.md documents the
bodies + the days-of-week-needs-array gotcha + names ACG as the test bed; capability map
+ api.md history updated. No production objects touched; no test leftovers.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Key is now provisioned + live-verified, so grounded the skill in the real spec (RTFM):
- Mapped the OpenAPI surface (v44.4.10, 239 paths / 354 ops) — capability map added to
SKILL.md (what the platform exposes vs what's wrapped vs raw-only).
- Added 6 live-verified read wrappers (ns.py + ns_client.py): callqueues, timeframes,
sites, contacts, autoattendants, billing (domain limits/usage).
- Replaced the stale "not yet provisioned" credentials section with the live status
(vaulted nsr_ reseller key, key-id nsr_hSGUB5Wo, scope Reseller 91912.service, RW) +
the pbx.packetdial.com vs api.ucaasnetwork.com hostname note + override.
- api.md history updated. Writes remain gated behind --confirm; everything unwrapped
reachable via `raw`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>