guru-connect

Author	SHA1	Message	Date
Mike Swanson	367906bd54	fix(agent): SPEC-016 Phase B review fixes (re-image-stable machine_uid, ACL TOCTOU, load_cak error classes, PS timeout, fail-fast guard) H1: derive machine_uid from the durable hardware salt ALONE (SMBIOS UUID, or board+disk serial) plus a fixed namespace, so it survives an OS re-image (which regenerates MachineGuid). MachineGuid is demoted to a last-resort signal used only when no hardware salt is readable (volatile, reboot-only floor). Re-image stability proven by salted_uid_is_reimage_stable_independent_of_machine_guid. H2: in store_cak, lock the directory ACL BEFORE any secret bytes are written; the temp file is created inside the already-locked dir, then renamed. No ciphertext ever exists at an inherited/world-readable path. Ordering made an explicit precondition, not an unstated inheritance assumption. M1: load_cak now returns a LoadCakError enum distinguishing Io (incl. PermissionDenied — operational) from Decrypt (the real tamper/wrong-machine signal). Only a successful READ whose DPAPI decrypt fails hard-stops. M2: the PowerShell SMBIOS/board/disk shell-out is spawned and waited on with a 10s wall-clock bound; on timeout the child is killed and the signal is treated as missing (falls back through the chain), never panics. Keeps CREATE_NO_WINDOW -NonInteractive -NoProfile. L1: warn! breadcrumb when the salted derivation degrades to MachineGuid-only, so the server-side collision-gate operator has a clue. No secret values logged. C1: keep the SYSTEM+Administrators ACL (Option A target). store_cak now does a read-back verification immediately after writing and fails at ENROLL time if this context cannot read its own store; resolve_agent_credential fails fast with an actionable SPEC-017 message on an access-denied store instead of silently re-enrolling/bricking. Guarded comment notes this is satisfied once the SYSTEM service host lands. Deferred items (clear_cak placeholder, legacy api_key path) left as-is. Verification on x86_64-pc-windows-msvc: cargo fmt --check clean, clippy -D warnings clean, release build OK, 52 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 12:54:18 -07:00
Mike Swanson	52477e4c4a	feat(agent): first-run enrollment client + run-mode wiring (SPEC-016 Phase B items 3,5) New enroll module: on a managed agent with no stored cak_ but with enrollment_key + site_code, POST machine_uid + hostname + labels to <https-base>/api/enroll and persist the minted cak_. Handles every Phase A status code distinctly: - 201 new / 200 reuse -> persist cak_ (DPAPI store) and connect - 202 collision_pending -> log "pending operator confirmation", slow re-check loop (no key issued; cannot connect until confirmed) - 401 ENROLL_REJECTED / 409 ENROLL_SITE_CONFLICT -> distinct actionable errors, long backoff (won't fix without operator action, but recovers automatically once it does) — no tight loop - 429 -> honor Retry-After, short backoff - network / 5xx / decode -> short backoff The enrollment_key and cak_ are never logged. Uses the existing reqwest client and the update path's TLS posture (rustls; dev-insecure only in debug + opt-in). Wire-contract unit tests pin the request shape against the server's EnrollRequest/EnrollLabels and decode active + pending bodies. main.rs run-mode wiring: before a managed agent connects, resolve the operating credential by precedence — stored cak_ (steady state, no network) -> first-run enrollment -> DEPRECATED legacy api_key (transition only, logged at WARNING) -> error. The relay already accepts the cak_ as the api_key query param, so the persistent transport authenticates with it unchanged. Attended/support-code and viewer paths are untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 11:44:40 -07:00
Mike Swanson	87c6e17d4a	feat(agent): cak_ at-rest credential store (SPEC-016 Phase B item 4) Store the per-machine cak_ with BOTH layers Mike locked: DPAPI-machine encryption (CryptProtectData with CRYPTPROTECT_LOCAL_MACHINE — a copied blob is inert off the box) inside a SYSTEM/Administrators-only ACL'd file at %ProgramData%\GuruConnect\credentials\agent.cak. The directory + file ACL is hardened via icacls (/inheritance:r + grant to the well-known SIDs S-1-5-18 and S-1-5-32-544, locale-independent) — auditable, with far less unsafe FFI than building a registry-key security descriptor by hand. Co-locates with the existing %ProgramData%\GuruConnect config/seed dir. Provides store_cak / load_cak / clear_cak. store_cak writes atomically (temp file + rename in the locked dir). load_cak treats a present-but- undecryptable blob as a hard error (tamper / cross-machine copy) rather than silently re-enrolling over it. The plaintext is never logged; the transient plaintext copy is scrubbed after encryption. DPAPI output blobs are LocalFree'd. Enables the Win32_Security_Cryptography windows feature. Round-trip unit tests cover encrypt/decrypt recovery across lengths and that a tampered blob fails to decrypt (DPAPI authenticates its blobs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 11:44:23 -07:00
Mike Swanson	6a000d012f	feat(agent): extend config contract for enrollment (SPEC-016 Phase B item 2) Add enrollment_key + site_code to EmbeddedConfig and the resolved Config alongside the existing labels, and add department/device_type label fields (SPEC-007 AgentStatus parity). The legacy api_key is retained but made optional/defaulted so a SPEC-016 site installer can carry only the enrollment credentials; existing pre-enrollment installers still parse. The enrollment fields are #[serde(skip)] on Config so they are never written to the on-disk TOML (install-time material only); apply_enrollment_env layers them from GURUCONNECT_ENROLLMENT_KEY / GURUCONNECT_SITE_CODE on the file and env load paths. The embedded path carries them from the install blob. Config delivery itself (signed wrapper) is Phase C and unchanged here. Add Config::https_base() deriving the REST API base (https://host[:port]) from the wss:// server_url so the enroll client and the persistent transport share one authority. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 11:44:09 -07:00
Mike Swanson	d0b8db070f	feat(agent): hardware-salt machine_uid (SPEC-016 Phase B item 1) Extend the SPEC-004 machine_uid derivation with the locked SPEC-016 hardware salt: combine the Windows MachineGuid with the SMBIOS system UUID (Win32_ComputerSystemProduct.UUID), falling back to motherboard serial (Win32_BaseBoard.SerialNumber) + primary disk serial when the SMBIOS UUID is absent or a degenerate placeholder (all-zeros / all-FFs, emitted by some OEMs and hypervisor templates). Signals are read via narrow PowerShell CIM queries (hidden window, no profile) rather than adding a WMI crate or hand-rolling COM IWbemServices for two scalar reads. Values are normalized (trim + upper-case) so vendor case/space drift never perturbs the digest. The combined string is SHA-256'd into the existing opaque muid_<hex> shape, preserving the wire identity the relay connect path already reports while making it survive an OS re-image on the same hardware. Which signal set fed the result is logged (source label only, never the secret values). Adds unit tests for derivation determinism + signal-sensitivity, degenerate-SMBIOS rejection, and signal normalization. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 11:43:56 -07:00
azcomputerguru	89c3718266	Merge pull request 'SPEC-016 Phase A: zero-touch enrollment backend + migration' (#5 ) from feat/spec-016-enrollment into main All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 10m37s Details Build and Test / Build Server (Linux) (push) Successful in 15m25s Details Build and Test / Security Audit (push) Successful in 5m28s Details Build and Test / Build Summary (push) Successful in 23s Details	2026-06-02 11:19:37 -07:00
Mike Swanson	4106fc4bc4	style(enroll): cargo fmt --all (satisfy CI fmt gate) All checks were successful Build and Test / Build Agent (Windows) (pull_request) Successful in 16m35s Details Build and Test / Build Server (Linux) (pull_request) Successful in 19m7s Details Build and Test / Security Audit (pull_request) Successful in 5m27s Details Build and Test / Build Summary (pull_request) Successful in 26s Details The Phase A work passed cargo check + clippy + tests locally but missed `cargo fmt --all -- --check` (the first step of the Linux CI job): module ordering in db/mod.rs and two trailing-comment alignments in rate_limit.rs. No logic change. Agent build failure on the prior run was transient infra (verified: agent crate compiles clean locally). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 10:48:51 -07:00
Mike Swanson	0f02f23765	fix(enroll): SPEC-016 Phase A review fixes (cross-site guard, timing oracle, TOCTOU) Some checks failed Build and Test / Build Agent (Windows) (pull_request) Failing after 10m11s Details Build and Test / Build Server (Linux) (pull_request) Failing after 10m5s Details Build and Test / Security Audit (pull_request) Successful in 8m5s Details Build and Test / Build Summary (pull_request) Has been skipped Details Applies the four review fixes to POST /api/enroll, all in server/src/api/enroll.rs (+ a new ENROLL_SITE_CONFLICT event type in server/src/db/events.rs): 1. HIGH — close the within-tenant cross-site silent-move hijack. A valid key for site B presented for a machine_uid already bound to a DIFFERENT site is now REFUSED (409 ENROLL_SITE_CONFLICT) instead of silently repointing the row and minting a fresh cak_. No move, no key. Emits an ENROLL_SITE_CONFLICT audit event + alert TODO. Same-site match still resolves to reuse; a NULL prior site_id is a first relational bind, not a move. The unauthenticated site_move mint path is removed; deliberate moves are deferred to the Phase-B --reassign flow + dashboard. 2. MEDIUM — kill the timing/enumeration oracle. Unknown site_code and no-active-key early rejects now pay a dummy Argon2id verify against a fixed, valid throwaway PHC constant (TIMING_EQUALIZER_PHC) before returning the identical 401, so every rejection path pays one KDF. The constant is asserted valid + verifying in tests. 3. LOW — fix the new-enroll TOCTOU. The dedup lookup + INSERT is wrapped in a bounded retry loop: a concurrent first-enroll of the same machine_uid whose INSERT loses the unique-index race (classified by is_machine_uid_conflict on SQLSTATE 23505 + machine_uid constraint) now re-looks-up and converges to reuse instead of 500ing. A non-machine_uid unique violation still surfaces as 500. 4. LOW — make the collision-gate doc honest + leave an enforcement TODO. The module doc now states the gate withholds only a NEWLY minted cak_ (a prior clean cak_ survives) and that nothing consults enrollment_state at control time yet, with a TODO(SPEC-016 Phase B/D) marker for relay/control-plane enforcement + revocation. Verify: cargo check, cargo clippy --all-targets, and cargo test all clean on this Windows host (104 tests pass). Two DB-gated tests (cross-site bound-site_id exposure, machine_uid-vs-agent_id conflict classification) no-op without TEST_DATABASE_URL and run against real Postgres in CI; the Linux target / real-Postgres handler path is validated there, not on this host. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 10:28:31 -07:00
Mike Swanson	59e40c8019	feat(enroll): SPEC-016 Phase A — enrollment backend + migration Server-side zero-touch per-site enrollment (Phase A: backend + DB only; agent-side machine_uid derivation is Phase B, server treats it as opaque). Migration 010_spec016_enrollment.sql: - connect_sites: relational site anchor (site_code natural key, per-tenant unique). The spec assumed a sites table existed; it did not (site/company were free-text columns on connect_machines), so this creates a minimal one. - site_enrollment_keys: rotatable, Argon2id-hashed cek_ secret + monotonic version + hex fingerprint + active flag; one-active-per-site partial unique. - connect_machines: + site_id (FK), + enrollment_state ('active'\|'pending') collision gate, + per-tenant (tenant_id, machine_uid) unique index added ALONGSIDE the 008 global index (the connect-path upsert_machine ON CONFLICT arbiter binds to 008 — dropping it would break live reconnect). - connect_sites.enrollment_policy: reserved (default auto-approve), not enforced. auth/enrollment_keys.rs: cek_ mint (256-bit, OS CSPRNG), Argon2id hash/verify (reuses auth::password), and hex fingerprint vN (XXXX) per resolved-decision #3. db/sites.rs + db/enrollment_keys.rs: runtime sqlx persistence; rotate_key deactivates+inserts in one tx to hold the one-active-key invariant. POST /api/enroll (public, api/enroll.rs): site_code+cek_ verify against active key -> dedup on (tenant, machine_uid) -> new / reuse / site-move / collision. Collision gate (PROVISIONAL heuristic: online existing row + different hostname) -> pending, no usable cak_, alert. Mints cak_ via existing agent_keys path in the exact form relay::validate_agent_api_key expects. Per-(site_code,IP) rate-limit + lockout (EnrollLimiter). Audit events + [ENROLL] alert markers with TODO(SPEC-016) #dev-alerts notes. Admin (JWT) api/sites.rs: POST /api/sites/:id/enrollment-key/rotate (plaintext + fingerprint once) and GET .../enrollment-key (fingerprint/version, no secret). Routes wired in main.rs (enroll public, rotation admin). 13 new unit tests; full server suite 99 passing. cargo check + clippy clean on the host (Windows) target — Linux cross-target not installed here; server crate is platform-neutral Rust. No sqlx offline cache needed (codebase uses runtime queries, no query!). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 10:12:35 -07:00
Mike Swanson	c286a29b9d	spec: SPEC-016 resolve all 5 open questions (enrollment design decisions) All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 14m25s Details Build and Test / Build Server (Linux) (push) Successful in 20m31s Details Build and Test / Security Audit (push) Successful in 8m28s Details Build and Test / Build Summary (push) Successful in 30s Details Fold the 2026-06-02 interview decisions into SPEC-016: - Installer wrapper: ship BOTH signed .exe and signed MSI per site - cak_ at-rest storage: DPAPI-machine-encrypted blob in a SYSTEM-ACL'd location - Fingerprint: hex (7F2A), deliberately unlike RMM word-codes - machine_uid: per-tenant scope + hardware-derived salt (survives re-image, separates distinct boxes) + collision-gated activation (template-cloned VMs sharing a hardware UUID drop to pending + alert, need dashboard confirm) - Attended support-code path: unchanged (filename-based, already signing-safe) Open Questions section -> Resolved decisions + a short Remaining-for-planning list (exact hardware salt signal set, WiX/MSI authoring approach). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 09:54:19 -07:00
Mike Swanson	18429f6fe3	spec: add SPEC-016 zero-touch per-site agent enrollment All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 10m46s Details Build and Test / Build Server (Linux) (push) Successful in 15m33s Details Build and Test / Security Audit (push) Successful in 6m3s Details Build and Test / Build Summary (push) Successful in 25s Details ScreenConnect-class managed enrollment: one signed installer per site, machines self-register on first run and the server mints a per-machine cak_ key bound to a deterministic machine_uid (dedups re-installs). Per-site rotatable enrollment key (long secret + vN (XXXX) fingerprint); rotating blocks new enrollments from old installers, leaves enrolled agents untouched. Auto-approve + new-enrollment/site-move alert. Resolves SPEC-007's signature-vs-appended-config open question: sign the base agent once in CI + per-site signed wrapper that writes site config around the signed bytes (never appended into the PE). Deferred (room reserved): enrollment policy + per-seat licensing, --enroll-key/--site-code/--reassign flag overrides, technician-assisted interactive install. Tracking todo dbfe6a56. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 09:13:59 -07:00
Mike Swanson	3b9e4068c9	docs(roadmap): mark release signing shipped; add signed beta channel as P1-NOW All checks were successful Build and Test / Build Server (Linux) (push) Successful in 14m11s Details Build and Test / Build Agent (Windows) (push) Successful in 8m3s Details Build and Test / Security Audit (push) Successful in 5m38s Details Build and Test / Build Summary (push) Successful in 17s Details Release-path Azure Trusted Signing and auto-versioning were already shipped with v0.3.0 (stale [ ] -> [x]). Add a new P1/NOW item for a signed beta/test release channel: the auto build-and-test.yml agent artifact is unsigned, so testers can receive unsigned binaries. The beta channel (now implemented in release.yml) closes that gap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 07:57:04 -07:00
Mike Swanson	87f229509b	ci(release): add signed beta/test release channel Some checks failed Build and Test / Build Server (Linux) (push) Has started running Details Build and Test / Build Agent (Windows) (push) Has started running Details Build and Test / Security Audit (push) Has been cancelled Details Build and Test / Build Summary (push) Has been cancelled Details Add a `channel: stable \| beta` workflow_dispatch input to release.yml. `stable` is unchanged (byte-for-byte). `beta` produces a Windows agent binary signed by the identical fail-closed Azure Trusted Signing path, but skips the semver bump, changelog, and release commit, and publishes a prerelease-tagged Gitea release (vX.Y.Z-beta.<run_number>) at HEAD. So every binary handed to a tester is signed, not just formal releases. - prerelease tags excluded from stable LAST_TAG detection (both lookups) so a beta tag can't corrupt the next stable version computation - beta tag force-created/pushed -> idempotent on failed-run re-runs - changelog download gated to stable; release prerelease flag plumbed through to the Gitea REST payload Reviewed-by: Code Review Agent (APPROVE WITH NITS; N1 hardened) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 07:56:17 -07:00
Mike Swanson	40c7d860cc	spec(v2-session-core): add Task 9 — cak_ auto-enroll provisioning (TOFU) + shared-key retirement All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 7m10s Details Build and Test / Build Server (Linux) (push) Successful in 10m31s Details Build and Test / Security Audit (push) Successful in 4m1s Details Build and Test / Build Summary (push) Successful in 9s Details	2026-06-01 14:40:14 -07:00
Mike Swanson	0059b21db6	fix(server): revert migration 008 comment edit — modifying an applied sqlx migration breaks its checksum and crash-loops the server on startup; machines.rs ON CONFLICT fix retained All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 7m33s Details Build and Test / Build Server (Linux) (push) Successful in 11m57s Details Build and Test / Security Audit (push) Successful in 4m33s Details Build and Test / Build Summary (push) Successful in 11s Details	2026-06-01 10:05:38 -07:00
Mike Swanson	f950511e3e	fix(server): bind machine_uid upsert ON CONFLICT to the partial index (WHERE machine_uid IS NOT NULL) Some checks failed Build and Test / Build Agent (Windows) (push) Successful in 8m16s Details Build and Test / Build Server (Linux) (push) Successful in 11m58s Details Build and Test / Security Audit (push) Has started running Details Build and Test / Build Summary (push) Has been cancelled Details Bare ON CONFLICT (machine_uid) could not bind to migration 008's partial unique index, so no connect_machines row was persisted for any agent reporting a machine_uid. Confirmed live on 172.16.3.30 with a signed 0.3.0 test agent.	2026-06-01 09:50:34 -07:00
Mike Swanson	16017456aa	docs: 2026-05-31 security re-audit (Phase-1 EXIT) + roadmap reconcile All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 6m59s Details Build and Test / Build Server (Linux) (push) Successful in 10m35s Details Build and Test / Security Audit (push) Successful in 4m3s Details Build and Test / Build Summary (push) Successful in 7s Details /gc-audit --pass=security re-pass over the deployed v0.3.0 code: PASS, 0 CRITICAL/HIGH/MEDIUM/LOW. The 3 relay CRITICALs stay closed (verified in code AND live against the deployed binary), the prior agent-update-TLS HIGH and chat-logging LOW are fixed, and the net-new SPEC-004 surface (machine_uid dedup gate, session reaper/supersede, operator removal API) audits clean — no non-admin removal path, no uid-spoof hijack, no auth-plane crossover. Marks v2 Phase 1 formally exited (secure-session-core Task 8 complete). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 18:19:09 -07:00
guruconnect-ci	e967cce1a1	chore: release v0.3.0 [skip ci] v0.3.0	2026-06-01 00:10:58 +00:00
Mike Swanson	16586c4a1b	chore: reconcile manifest versions to v0.2.2 baseline All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 7m14s Details Build and Test / Build Server (Linux) (push) Successful in 11m25s Details Build and Test / Security Audit (push) Successful in 7m13s Details Build and Test / Build Summary (push) Successful in 1m2s Details agent + server Cargo.toml hardcoded 0.2.0 (below the workspace.package 0.2.2 and the last release tag v0.2.2); dashboard was on a divergent 2.0.0 scheme. Align all component manifests + the dashboard lockfile to the v0.2.2 baseline so the next release bumps them coherently to 0.3.0 rather than decreasing the dashboard. No code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 16:50:59 -07:00
Mike Swanson	96f9c0ab45	feat(dashboard): operator removal UI for stale machines/sessions (SPEC-004 Task 5) All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 7m13s Details Build and Test / Build Server (Linux) (push) Successful in 11m21s Details Build and Test / Security Audit (push) Successful in 4m12s Details Build and Test / Build Summary (push) Successful in 11s Details Admin-only per-row Remove + multi-select bulk removal on the machines view, plus per-row purge Remove on the sessions view, wired to the Task-5 admin API (DELETE /api/machines\|sessions/:id?purge=true, POST /api/machines/bulk-remove). Confirm modals (danger-styled, focus-trapped), TanStack refetch so purged rows leave the console, structured ApiError surfacing, honest partial-bulk summary, and admin-gating via useAuth().isAdmin as defense-in-depth over the server 403. Replaces the legacy all-user delete trigger. typecheck/lint/build clean. Implements specs/v2-stable-identity/plan.md Task 5 (dashboard portion). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 14:14:49 -07:00
Mike Swanson	5ee6675337	feat(server): operator removal of stale sessions/machines (SPEC-004 Task 5, server) All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 7m29s Details Build and Test / Build Server (Linux) (push) Successful in 10m58s Details Build and Test / Security Audit (push) Successful in 4m4s Details Build and Test / Build Summary (push) Successful in 8s Details Admin-gated soft-delete + purge so operators can clear ghost machines/sessions (the ~15-rows-for-one-host accumulation) from the console. - migration 009: deleted_at on connect_sessions + connect_machines, with partial indexes WHERE deleted_at IS NULL. - DELETE /api/machines/:agent_id?purge=true and DELETE /api/sessions/:id?purge=true soft-delete the row and purge the in-memory session (remove_session); the non-purge path keeps the legacy hard-delete / live-only disconnect. POST /api/machines/bulk-remove handles multi-select (batch cap 500). All admin-gated (AdminUser -> 403; tightens the prior any-user delete) and audited to connect_session_events (actor + target + trusted client IP). - list/get queries filter deleted_at IS NULL so removed units leave the console; upsert revives (deleted_at = NULL) a genuinely-reconnecting machine. The keyed-reattach identity resolver (get_machine_by_id) is intentionally unfiltered. Dashboard removal UI is the A3b follow-up. 86 server tests pass; fmt/clippy/test clean. Implements specs/v2-stable-identity/plan.md Task 5 (server portion). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 13:52:36 -07:00
Mike Swanson	cef1928379	style(server): cargo fmt for SPEC-004 Task 2 + Task 4 All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 6m40s Details Build and Test / Build Server (Linux) (push) Successful in 10m18s Details Build and Test / Security Audit (push) Successful in 4m12s Details Build and Test / Build Summary (push) Successful in 12s Details Pure rustfmt reflow of the Task 2 (machine_uid dedup) and Task 4 (session reaping) code; no logic change. The CI Build-Server-Linux job gates on cargo fmt --check, which the two feature commits failed because local validation ran check/clippy/test but not fmt --check. fmt --check, check, and clippy -D warnings all clean now. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 12:27:01 -07:00
Mike Swanson	4e80573cbd	feat(server): reap stale persistent sessions + same-machine supersede (SPEC-004 Task 4) Some checks failed Build and Test / Build Server (Linux) (push) Failing after 3m32s Details Build and Test / Build Agent (Windows) (push) Has started running Details Build and Test / Security Audit (push) Has started running Details Build and Test / Build Summary (push) Has been cancelled Details A periodic reaper removes persistent, offline, viewerless sessions whose last heartbeat is older than a 10-minute TTL (60s sweep spawned at startup), and a same-machine supersede on the new-session path drops a stranded prior session when a legacy no-uid agent upgrades to a fresh agent_id + machine_uid. Both removals re-assert the predicate under the write lock (remove_session_if) to close a snapshot->remove TOCTOU. Security: keyed (cak_) agents pass machine_uid=None, so they never trigger supersede and are never reaped as a uid victim; online, viewer-attached, and support sessions are never reaped. 82 server tests pass; clippy clean. Implements specs/v2-stable-identity/plan.md Task 4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 12:21:15 -07:00
Mike Swanson	ffca7f0cee	feat(server): dedup machines on machine_uid (SPEC-004 Task 2) Some checks failed Build and Test / Build Server (Linux) (push) Failing after 3m48s Details Build and Test / Build Agent (Windows) (push) Successful in 7m34s Details Build and Test / Security Audit (push) Successful in 4m44s Details Build and Test / Build Summary (push) Has been skipped Details Persist the agent-reported machine_uid and dedup connect_machines on it so a single physical machine can't register duplicate rows when its config-file agent_id regenerates (the ghost-session root cause). - migration 008: nullable connect_machines.machine_uid + partial unique index (WHERE machine_uid IS NOT NULL); idempotent, startup-applied. - upsert_machine: two-path dedup (ON CONFLICT machine_uid when present, else the legacy ON CONFLICT agent_id path, unchanged). - session reattach: a machine_uid index consulted before agent_id, with all removal paths purging it. - security: keyed (cak_) agents stay authoritative — their claimed machine_uid is dropped (effective_machine_uid=None); uid is dedup-only for un-keyed / support-code agents. Startup restore skips uid-indexing keyed machines and fails closed if the keyed-set query errors. 74 server tests pass; clippy clean. Implements specs/v2-stable-identity/plan.md Task 2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 12:06:50 -07:00
Mike Swanson	97780304e7	fix(agent): make native H.264 viewer render live frames All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 7m2s Details Build and Test / Build Server (Linux) (push) Successful in 10m24s Details Build and Test / Security Audit (push) Successful in 4m15s Details Build and Test / Build Summary (push) Successful in 9s Details The native viewer's H.264 path (Task 7 first-cut, compile-verified only) never rendered a frame. Three stacked bugs, all confirmed via live loopback: 1. decoder: MF_E_NOTACCEPTING (0xC00D36B5) was treated as fatal and only one output was drained per call, so once the MFT filled it rejected every subsequent frame. decode() now returns Vec<DecodedFrame>, drains on back-pressure and retries the unconsumed sample, then drains all ready outputs. 2. decoder: the NV12 output type was hand-built and rejected by the MS H.264 decoder MFT (MF_E_TRANSFORM_TYPE_NOT_SET, 0xC00D6D60). It is now negotiated by enumerating GetOutputAvailableType on STREAM_CHANGE / TYPE_NOT_SET. 3. render: a manual pump_messages() in about_to_wait stole winit's own thread messages and froze the event loop after one iteration, so frames were never drained from the channel. Removed; winit's run_app pump already services the WH_KEYBOARD_LL hook. Validated on a 5070 loopback: 0 decode errors, frames decode/paint/present (present count 0 -> 1740). Reviewed (APPROVE-WITH-NITS); diagnostics stripped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 11:25:05 -07:00
azcomputerguru	afbf0d81b8	spec: add SPEC-015 Configurable Notification Overlay All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 8m0s Details Build and Test / Build Server (Linux) (push) Successful in 11m26s Details Build and Test / Security Audit (push) Successful in 4m37s Details Build and Test / Build Summary (push) Successful in 12s Details Comprehensive specification for on-screen notification when technician connects. - Semi-transparent topmost window with configurable message, position, duration - Dashboard admin settings page (enable/disable, message template, position, duration) - Template variables: {{technician_name}}, {{company}}, {{time}} - Agent displays overlay on StartStream, auto-hides after duration or manual dismiss - Database: notification_config singleton table - Protobuf: NotificationConfig message in StartStream - Priority: P2, Effort: Medium (3-4 weeks) - Added to roadmap under Core Remote Control Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-05-31 08:40:53 -07:00
azcomputerguru	b45c683a51	spec: add SPEC-014 Branding and White-Label Configuration All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 8m16s Details Build and Test / Build Server (Linux) (push) Successful in 11m48s Details Build and Test / Security Audit (push) Successful in 4m35s Details Build and Test / Build Summary (push) Successful in 13s Details Comprehensive specification for branding/whitelabel configuration. - Dashboard admin settings page (logo, brand hue, product name, company name, favicon) - OKLCH color system with CSS variables for dynamic theming - Agent tray tooltip customization via registry key - Singleton database table with public GET endpoint - Priority: P2, Effort: Medium (4-6 weeks) - Added to roadmap under Server/API (v2 Phase 2) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-05-31 08:12:37 -07:00
azcomputerguru	5637e4c1f9	spec: add SPEC-013 Windows Session Selection and Backstage Mode All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 8m5s Details Build and Test / Build Server (Linux) (push) Successful in 11m24s Details Build and Test / Security Audit (push) Successful in 4m30s Details Build and Test / Build Summary (push) Successful in 12s Details	2026-05-31 07:54:25 -07:00
Mike Swanson	b3e8f32734	feat(agent): derive + report deterministic machine_uid (SPEC-004 Task 1) All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 7m4s Details Build and Test / Build Server (Linux) (push) Successful in 9m41s Details Build and Test / Security Audit (push) Successful in 4m11s Details Build and Test / Build Summary (push) Successful in 10s Details Agent now derives a recomputable, opaque machine_uid (Windows: SHA-256 of the OS MachineGuid at HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid -> muid_<hex>; non-Windows / registry-failure: persisted random UUID, warn-logged). Raw GUID never exposed; OnceLock-cached. Reported ALONGSIDE agent_id (unchanged) on AgentStatus (new additive proto field 12) and in the connect handshake query. This is the stable identity that fixes config-loss duplicate registrations (DESKTOP-I66IM5Q x9); server-side dedup keying that consumes it is SPEC-004 Task 2. Non-breaking, isolated. 5 unit tests; cargo fmt/clippy(-D warnings)/test green on GURU-5070. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 21:23:11 -07:00
Mike Swanson	92bc522c3a	spec: add v2-stable-identity implementation plan (SPEC-004 breakdown) Some checks failed Build and Test / Build Server (Linux) (push) Has started running Details Build and Test / Build Agent (Windows) (push) Has started running Details Build and Test / Security Audit (push) Has been cancelled Details Build and Test / Build Summary (push) Has been cancelled Details Ordered, execution-ready plan for SPEC-004 (stable machine identity + session reaping + operator removal). Works out the core integration: machine_uid = deterministic MachineGuid-based hardware identity (recomputable, so config loss can't duplicate); per-agent cak_ key stays the credential/trust boundary; they compose so one cak_ key per machine_uid = one key per real machine (the prerequisite the fleet key-migration #7 needs). Root cause grounded in code: agent_id is a random UUID (config.rs:90), connect_machines dedups on ON CONFLICT (agent_id), so config loss -> duplicate rows (DESKTOP-I66IM5Q x9 live). 5 ordered tasks (agent uid -> server dedup -> reconcile/age-out -> reaping -> operator removal). Unblocks #7 -> #5. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 21:17:49 -07:00
Mike Swanson	df51d40094	feat(server): per-agent H.264 test override (h264-test tag) [Task 8 prep] All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 7m32s Details Build and Test / Build Server (Linux) (push) Successful in 10m55s Details Build and Test / Security Audit (push) Successful in 4m14s Details Build and Test / Build Summary (push) Successful in 11s Details Lets the HW-H.264 path be live-validated on tagged test agents without affecting the live client fleet. Adds H264_TEST_TAG="h264-test" + a pure prefer_h264_for(tags) helper (DEFAULT_PREFER_H264 \|\| tags contains the tag, case-insensitive); StartStream codec negotiation now computes prefer_h264 from the agent's reported tags instead of the bare const, and logs the computed value. SAFETY: untagged sessions are byte-for- byte unchanged (prefer_h264 == DEFAULT_PREFER_H264 == false -> raw); the supports_h264 guard still forces raw for a no-HW agent even when tagged. DEFAULT_PREFER_H264 stays false (flipping the global default is a separate future step). 3 unit tests added. cargo fmt/clippy(-D warnings)/test green on GURU-5070 (37 agent + 64 server). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 20:17:38 -07:00
azcomputerguru	7be8f454e0	Merge remote security fixes with local specs All checks were successful Build and Test / Build Server (Linux) (push) Successful in 9m56s Details Build and Test / Build Agent (Windows) (push) Successful in 6m21s Details Build and Test / Security Audit (push) Successful in 4m21s Details Build and Test / Build Summary (push) Successful in 9s Details	2026-05-30 19:21:42 -07:00
Mike Swanson	c98692e424	fix(server): revoke viewer tokens on logout + stop logging chat content Some checks failed Build and Test / Build Server (Linux) (push) Has started running Details Build and Test / Build Agent (Windows) (push) Has started running Details Build and Test / Security Audit (push) Has been cancelled Details Build and Test / Build Summary (push) Has been cancelled Details Security follow-ups (audit 2026-05-30, both reviewed APPROVE): - MEDIUM: viewer tokens were never blacklisted on logout, so a minted session-scoped viewer token stayed valid up to its 5-min TTL after the user logged out. Add a per-user ViewerTokenRegistry (Arc<Mutex<HashMap<sub, Vec<(token, expires_at)>>>>, prune-on-insert) on AppState; mint_viewer_token registers each token under the user sub; logout drains take_for_user(sub) and blacklists each via the existing token_blacklist. The viewer WS already calls is_revoked, so no WS change. Key chain user.user_id == ViewerClaims.sub == registry key verified consistent. 8 new tests. - LOW: relay chat logs now emit content length, not the chat body (support-chat can carry secrets/PII). cargo fmt/clippy(-D warnings)/test green on GURU-5070 (37 agent + 61 server). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 19:20:15 -07:00
azcomputerguru	761bae5d01	spec: update SPEC-012 to include both Serial Console + PTY Shell modes Major update to SPEC-012 adding dual-mode terminal access: Mode 1: Serial Console Mode (True Remote Console) - Direct access to system serial console (/dev/ttyS0 or /dev/console) - Sees GRUB bootloader, kernel boot messages, login prompts, kernel panics - Boot-time interaction: select GRUB entries, edit kernel parameters, single-user mode - Requires root privileges or CAP_SYS_TTY_CONFIG capability - Setup: GRUB + kernel parameters configured for serial console output - Like KVM-over-IP or IPMI Serial-over-LAN (text-mode equivalent) Mode 2: PTY Shell Mode (Interactive Shell) - Spawn pseudo-TTY with bash/zsh shell session - Normal server management (package updates, log review, etc.) - Runs as unprivileged agent service user - Standard interactive shell with full ANSI/VT100 support Architecture: - Agent mode selection based on viewer request (console vs. shell) - Dashboard shows two buttons: "Console" and "Shell" for headless agents - Same xterm.js viewer handles both modes transparently - Protobuf extensions: TerminalModeRequest enum, console_mode flag Security: - Console mode requires root (boot-level control risk) - Recommend RBAC: separate console_access and shell_access permissions - Console sessions should require MFA (Phase 2) - Audit logging for both modes Setup Requirements: - One-time GRUB configuration for serial console - systemd service with CAP_SYS_TTY_CONFIG for console mode - serial-getty@ttyS0.service enabled for login prompt Updated effort: Medium (5-7 weeks, up from 4-6) Priority remains P2 Addresses user request for "remote console" (as if at the machine) not just shell access. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-05-30 19:02:27 -07:00
Mike Swanson	8119292bcd	fix(agent): close auto-update TLS bypass (MITM -> RCE) [HIGH] All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 7m24s Details Build and Test / Build Server (Linux) (push) Successful in 10m41s Details Build and Test / Security Audit (push) Successful in 4m19s Details Build and Test / Build Summary (push) Successful in 9s Details The auto-update path built both reqwest clients with an unconditional danger_accept_invalid_certs(true), so a network MITM could serve an arbitrary update .exe (checksum is no defense — same unverified channel) and gain RCE on every managed endpoint. Replace with dev_insecure_tls() = cfg!(debug_assertions) && env GURUCONNECT_DEV_INSECURE_TLS: the cfg gate compiles out of release builds, so a shipped agent ALWAYS verifies certs; dev keeps a self-signed escape hatch. Loud warn when the insecure path is taken; verify_checksum kept + documented as transport-integrity (not tamper) defense; TODO + follow-up for embedded-key update signing (defense-in-depth). Release-invariant unit test added. cargo fmt/clippy(-D warnings)/test green on GURU-5070 (90 tests). Closes the 2026-05-30 security-audit HIGH (reports/2026-05-30-gc-audit.md). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 19:02:23 -07:00
Mike Swanson	9f44807230	audit: security pass re-audit (2026-05-30) — 3 CRITICALs verified CLOSED Some checks failed Build and Test / Build Agent (Windows) (push) Successful in 7m1s Details Build and Test / Build Server (Linux) (push) Successful in 10m17s Details Build and Test / Security Audit (push) Has started running Details Build and Test / Build Summary (push) Has been cancelled Details Independent /gc-audit --pass=security re-derivation of the v2 secure-session-core rebuild: all three 2026-05-29 relay CRITICALs confirmed closed with no bypass (any-JWT-joins-session, viewer-WS blacklist, JWT-as-agent-key). Relay plane clean; consent/code paths fail closed; abuse surface bounded; rate limiting proxy-aware. Net-new: 1 HIGH (agent auto-update disables TLS cert verification -> MITM-RCE, agent/src/update.rs:45,111 — outside the relay plane), 1 LOW (chat content logged), 2 INFO. Report: reports/2026-05-30-gc-audit.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 18:48:48 -07:00
azcomputerguru	a062a825ea	spec: add SPEC-012 Headless Linux Mode (Direct TTY Access) Comprehensive specification for terminal-based remote access to headless Linux servers (no X11/Wayland GUI): Core Capabilities: - PTY spawn via openpty() + fork/exec shell (/bin/bash or $SHELL) - Terminal I/O: PTY output → TerminalData protobuf → WebSocket relay - Input: keyboard → TerminalInput protobuf → PTY master write - Resize: SIGWINCH on terminal window resize, TIOCSWINSZ ioctl - Auto-detection: agent detects headless environment (no DISPLAY) at runtime Viewer: - xterm.js-based web terminal (80x24 default, resizable) - Full ANSI/VT100 support (colors, cursor control, vim/nano/htop) - Same protobuf-over-WSS protocol, support-code/agent-key auth - Dashboard shows "Terminal" badge, routes to terminal viewer Use Cases: - Server management (headless Ubuntu Server, VMs, containers) - Emergency recovery (systemd rescue mode, single-user mode) - Container debugging (exec into running containers) - SSH replacement with centralized audit logging Protobuf Extensions: - TerminalData, TerminalInput, TerminalResize messages - AgentStatus.terminal_mode flag Security: - Run agent as unprivileged user + sudo for privileged commands - Session recording to terminal_recordings table (asciicast format) - Same auth model as GUI agents (support-code / per-agent key) Estimated effort: Medium (4-6 weeks) Priority: P2 (server management is market-critical) Extends SPEC-010 Linux agent with PTY alternative to screen capture. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-05-30 18:28:34 -07:00
azcomputerguru	b1862800a1	spec: add SPEC-011 Mobile Agent Support (iOS and Android) Comprehensive specification for iOS/Android devices as remote control targets: iOS Agent (View-Only): - ReplayKit 2 screen capture (user consent required) - VideoToolbox H.264 encoding - NO input injection (iOS sandboxing limitation) - APNs push notifications for session requests - Foreground-only operation (OS requirement) Android Agent (View + Control): - MediaProjection API screen capture (user consent) - MediaCodec H.264 encoding - Accessibility Service for input injection (tap/swipe/type) - FCM push notifications - Foreground service with persistent notification Architecture: - Native Swift/SwiftUI (iOS) and Kotlin/Jetpack Compose (Android) apps - Same protobuf-over-WSS protocol as desktop agents - Support-code authentication (persistent mode deferred to Phase 2) - Minor protobuf additions: MobileCapabilities, TouchEvent - Server push module: APNs (a2 crate) + FCM HTTP v1 Key constraints: - Attended-only sessions (user must grant permission) - Foreground-only (cannot capture in background on either platform) - iOS view-only (platform sandbox prevents input injection) - Consent-first model (MediaProjection/ReplayKit user prompts) Estimated effort: X-Large (16-20 weeks, requires mobile expertise) Priority: P3 Distinct from GuruRMM SPEC-017 (MDM/inventory) — this is remote control, not device management. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-05-30 18:24:16 -07:00
Mike Swanson	442eecefc0	fix(server,agent): apply Tasks 3-5 review fixes (non-blocking) All checks were successful Build and Test / Build Agent (Windows) (push) Successful in 7m6s Details Build and Test / Build Server (Linux) (push) Successful in 10m39s Details Build and Test / Security Audit (push) Successful in 4m14s Details Build and Test / Build Summary (push) Successful in 8s Details From the secure-session-core Tasks 3-5 code review (APPROVE-WITH-FIXES): - MEDIUM-2: delete the dead `validate_agent_key` "accept-any-key" placeholder + its AuthenticatedAgent/AuthState scaffolding (zero callers; the real agent auth is validate_agent_api_key + per-agent cak_ keys). Removes an auth landmine. - LOW-3: stop interpolating support-code values into 3 relay log lines (bearer credentials). - LOW-1: document the X-Real-IP trust requirement in ip_extract.rs (NPM must set it from $remote_addr); behavior unchanged. - LOW-2: correct the consent/heartbeat comment in agent session loop (the loop awaits the dialog; safe because CONSENT_TIMEOUT 60s < HEARTBEAT_TIMEOUT 90s). cargo fmt/clippy(-D warnings)/test all green on GURU-5070 (89 tests, 0 warnings). MEDIUM-1 (viewer-token logout revocation) remains a tracked follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 18:23:03 -07:00
azcomputerguru	5e2325507f	spec: add SPEC-010 Cross-Platform Agent Support (macOS and Linux) Comprehensive specification for expanding agent support beyond Windows: macOS Agent (Priority 1): - ScreenCaptureKit API (macOS 13+) with AVFoundation fallback - CGEvent input injection - VideoToolbox H.264 encoding - NSStatusItem menu bar icon - Universal binary (x86_64 + arm64) - Code signing and notarization Linux Agent (Priority 2): - X11 XShm screen capture with Wayland detection - XTest input injection - VA-API hardware H.264 encoding with software fallback - StatusNotifier system tray - .deb and .rpm packaging Architecture: - Platform abstraction layer (traits for capture/input/encoder/tray) - Refactor existing Windows code behind PlatformCapture/Input/Encoder - No protobuf protocol changes - Same authentication (support codes and agent keys) Estimated effort: X-Large (12-16 weeks) Priority: P2 (market-critical for multi-platform MSP adoption) Updated roadmap: promoted from P3 to P2 with full spec link. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-05-30 18:15:16 -07:00
Mike Swanson	c736a710a1	docs: record Tasks 3-5 code review (APPROVE-WITH-FIXES) in plan status Some checks failed Build and Test / Build Server (Linux) (push) Failing after 3m43s Details Build and Test / Build Agent (Windows) (push) Successful in 7m43s Details Build and Test / Security Audit (push) Successful in 4m57s Details Build and Test / Build Summary (push) Has been skipped Details Formal review on GURU-5070: cargo fmt/clippy/test green (89 tests, 0 warnings); the 3 audit CRITICALs verified closed with no bypass; all security paths fail closed. Non-blocking follow-ups tracked (viewer-token logout revocation, delete dead validate_agent_key placeholder, X-Real-IP/log hygiene). Remaining for Phase-1 exit: Task 8 e2e verification + /gc-audit security re-audit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 18:14:02 -07:00
Mike Swanson	786d3e47af	docs: correct roadmap — v2 Phase 1 already landed, not a future sprint Some checks failed Build and Test / Build Server (Linux) (push) Failing after 3m12s Details Build and Test / Security Audit (push) Successful in 4m53s Details Build and Test / Build Agent (Windows) (push) Successful in 7m14s Details Build and Test / Build Summary (push) Has been skipped Details Re-baseline against actual git/deploy state: secure-session-core Tasks 1-7 are committed and DEPLOYED; the 3 audit CRITICALs are closed and live in prod (verified: deployed checkout `abc55ab` descends from the CRITICAL#1 fix + Task 7; guruconnect.service running on :3002). The prior "Sprint 0: bypasses are live" banner was wrong (stale 2026-05-29 audit narrative) and is removed. Remaining to exit Phase 1 = secure-session-core Task 8 (e2e verification + security re-audit) + Code-Review sign-off on Tasks 3-5. Schema note corrected (connect_agent_keys + tenancy already exist via migration 004). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 17:36:18 -07:00
Mike Swanson	03f62d413f	docs: annotate roadmap with v2-first direction + phase mapping Some checks failed Build and Test / Build Server (Linux) (push) Failing after 4m54s Details Build and Test / Build Agent (Windows) (push) Has started running Details Build and Test / Security Audit (push) Has started running Details Build and Test / Build Summary (push) Has been cancelled Details Mark SPEC-003..009 as work-items inside the SPEC-002 v2 phases (not standalone v1 backlog): banner records the v2-reset decision + the Sprint-0 relay-auth CRITICAL hotfix, a phase-mapping table (004->P1, 008->P0/1, 003/005/006/007->P2, 009->P3), inline [-> v2 Phase N] tags per spec, and a note to bake SPEC-003 inventory cols + SPEC-004 machine_uid + connect_agent_keys into the Phase-0 fresh schema. Sprint planning 2026-05-30 (Mike: v2 reset first). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 17:26:47 -07:00
Mike Swanson	7ab87384a7	spec: add SPEC-009 feature-rich documented API Some checks failed Build and Test / Build Server (Linux) (push) Failing after 3m42s Details Build and Test / Build Agent (Windows) (push) Successful in 7m39s Details Build and Test / Security Audit (push) Successful in 4m34s Details Build and Test / Build Summary (push) Has been skipped Details Everything the console does should be callable by API, documented and discoverable. Adds: OpenAPI 3.x generated from code (utoipa) + Swagger/Redoc at /api/docs (drift-proof, route<->spec parity test); long-lived revocable scoped API tokens (connect_api_tokens, hashed like agent keys) distinct from the 24h dashboard JWT and agent keys; an API-completeness gap audit (folds in SPEC-004/ 006/007 endpoints); consistent pagination/filtering + versioning policy. Today there is zero API doc tooling and no programmatic token. Depends on SPEC-008 for the documented error envelope; distinct from the ADR-001 integration contract. Large. Parallel guru-rmm SPEC-019. Requested by Mike 2026-05-30. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 16:35:57 -07:00
Mike Swanson	65eff5cf50	spec: add SPEC-008 valuable error messages Cross-cutting error-quality initiative: one structured AppError envelope (stable error_code + message + correlation_id) replacing the current ad-hoc mix (bare (StatusCode,&str) tuples, per-file ErrorResponse, two JSON envelopes the dashboard already unions); correlation-id middleware tied to tracing spans + response header so a reported id greps the log; contextual error logging with identifiers + error chain; sweep the 37 server `let _ =` swallows (the pattern that silently hid migration-005's missing columns); dashboard renders the real cause + correlation id (drop the hardcoded generic at MachinesPage.tsx:202); agent logs why/where auth/connection failed (the auth-loop incident gave no local signal). Phaseable; Large. Parallel RMM request keeps conventions aligned. Requested by Mike 2026-05-30. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 16:30:07 -07:00
Mike Swanson	008d2bf30b	spec: add SPEC-007 managed-agent installer builder Dashboard "Build Installer" wizard for pre-labeled managed/persistent agents (Name/Company/Site/Department/Device Type/Tag/Type) with Download / Copy URL / Send Link, ScreenConnect-style. The embed-config build path already exists (downloads.rs appends EmbeddedConfig GURUCONFIG blob; AgentDownloadParams takes company/site/tags/api_key; agent reads it at config.rs:223) - missing is the UI, department + device_type fields (EmbeddedConfig/AgentStatus/connect_machines), name strategy, and Copy-URL/Send-Link actions. Labels persist at install time, feeding SPEC-003/005/006. Embedded key should be revocable per-machine/site (pairs with SPEC-004). Biggest open question: appending config after Authenticode signing invalidates the signature. Requested by Mike 2026-05-30. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 16:24:56 -07:00
Mike Swanson	0eb38520ed	spec: add SPEC-006 universal machine search Single search box matching case-insensitive substring across ALL machine attributes (OS, logged-on user, external/private IP, company, site, tag, serial, MAC, client version, ...) server-side, ScreenConnect-style. Replaces the dashboard's hostname/agent_id-only client filter (inadequate at ~900+ machines). pg_trgm GIN index over a concatenated searchable-text expression (INET cast to text, tags via array_to_string); multi-term AND; optional field-scoped syntax (os:/user:/ip:). Parameterized + fixed column allowlist (no injection), admin-guarded, DoS-capped. Depends on SPEC-003 (attrs must be persisted to be searchable); reuses SPEC-005 enriched payload. Requested by Mike 2026-05-30. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 16:21:10 -07:00
Mike Swanson	cdc182f0fb	spec: add SPEC-005 machines list view (dual indicators + rich rows) ScreenConnect "Access"-list parity for the Operator Console machines list: per-row dual Host/Guest connection indicators (Guest=agent is_online, Host=viewer_count>0 with viewer names + durations) and rich inline metadata (company, site, device type, tags, logged-on user + idle, client version in red when outdated). Live Host/Guest state already exists on SessionInfo (is_online, viewer_count, viewers); main work is enriching /api/machines with that + SPEC-003 inventory and redesigning MachinesPage rows. Depends on SPEC-003 (data), reads cleanest after SPEC-004 (dedup), dovetails SPEC-002 Phase 2. Company-tree nav split out as a P3 follow-up. Requested by Mike 2026-05-30. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 16:17:48 -07:00
Mike Swanson	f8bd4d1dab	spec: SPEC-004 add stable machine-derived identity as the primary fix Address duplicate registration at the source, not just via cleanup. Root cause now grounded: agent_id is a random UUID (config.rs:90 generate_agent_id) persisted only in the config file, so a portable/misconfigured execution (the Pavon desktop launcher) regenerates a fresh id each launch, defeating both the DB upsert (ON CONFLICT agent_id) and session-reuse dedupe. Add a deterministic machine_uid (Windows MachineGuid-based, recomputable) keyed by registration; reaping/supersede become defense-in-depth. Security: machine_uid is identity not authorization and must be bound to the per-machine agent key to prevent session/record hijack. Requested by Mike 2026-05-30. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 16:11:38 -07:00
Mike Swanson	ee900c6395	spec: add SPEC-004 session lifecycle reaping + operator removal Stop orphaned managed sessions accumulating in the Operator Console and let admins remove stale sessions/units individually and in bulk. Root cause confirmed in code: the Sessions list is the in-memory SessionManager; register_agent reconnect-reuse keys on a stable agent_id (session/mod.rs:169) and persistent sessions are never reaped on disconnect (session/mod.rs:519-542), so an agent reconnecting with a fresh agent_id leaves a new retained ghost session each time (observed: 15 sessions/0 live, ~10 orphans for one machine after a GuruConnect-client reconnect storm). Adds TTL sweep + same-machine supersede, admin-gated audited purge + bulk endpoints, and dashboard multi-select removal. Requested by Mike 2026-05-30. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 16:05:32 -07:00

1 2 3

145 Commits