Commit Graph

145 Commits

Author SHA1 Message Date
367906bd54 fix(agent): SPEC-016 Phase B review fixes (re-image-stable machine_uid, ACL TOCTOU, load_cak error classes, PS timeout, fail-fast guard)
H1: derive machine_uid from the durable hardware salt ALONE (SMBIOS UUID, or
board+disk serial) plus a fixed namespace, so it survives an OS re-image (which
regenerates MachineGuid). MachineGuid is demoted to a last-resort signal used
only when no hardware salt is readable (volatile, reboot-only floor). Re-image
stability proven by salted_uid_is_reimage_stable_independent_of_machine_guid.

H2: in store_cak, lock the directory ACL BEFORE any secret bytes are written;
the temp file is created inside the already-locked dir, then renamed. No
ciphertext ever exists at an inherited/world-readable path. Ordering made an
explicit precondition, not an unstated inheritance assumption.

M1: load_cak now returns a LoadCakError enum distinguishing Io (incl.
PermissionDenied — operational) from Decrypt (the real tamper/wrong-machine
signal). Only a successful READ whose DPAPI decrypt fails hard-stops.

M2: the PowerShell SMBIOS/board/disk shell-out is spawned and waited on with a
10s wall-clock bound; on timeout the child is killed and the signal is treated
as missing (falls back through the chain), never panics. Keeps
CREATE_NO_WINDOW -NonInteractive -NoProfile.

L1: warn! breadcrumb when the salted derivation degrades to MachineGuid-only,
so the server-side collision-gate operator has a clue. No secret values logged.

C1: keep the SYSTEM+Administrators ACL (Option A target). store_cak now does a
read-back verification immediately after writing and fails at ENROLL time if
this context cannot read its own store; resolve_agent_credential fails fast with
an actionable SPEC-017 message on an access-denied store instead of silently
re-enrolling/bricking. Guarded comment notes this is satisfied once the SYSTEM
service host lands.

Deferred items (clear_cak placeholder, legacy api_key path) left as-is.

Verification on x86_64-pc-windows-msvc: cargo fmt --check clean, clippy
-D warnings clean, release build OK, 52 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 12:54:18 -07:00
52477e4c4a feat(agent): first-run enrollment client + run-mode wiring (SPEC-016 Phase B items 3,5)
New enroll module: on a managed agent with no stored cak_ but with
enrollment_key + site_code, POST machine_uid + hostname + labels to
<https-base>/api/enroll and persist the minted cak_. Handles every Phase A
status code distinctly:
  - 201 new / 200 reuse -> persist cak_ (DPAPI store) and connect
  - 202 collision_pending -> log "pending operator confirmation", slow
    re-check loop (no key issued; cannot connect until confirmed)
  - 401 ENROLL_REJECTED / 409 ENROLL_SITE_CONFLICT -> distinct actionable
    errors, long backoff (won't fix without operator action, but recovers
    automatically once it does) — no tight loop
  - 429 -> honor Retry-After, short backoff
  - network / 5xx / decode -> short backoff
The enrollment_key and cak_ are never logged. Uses the existing reqwest
client and the update path's TLS posture (rustls; dev-insecure only in
debug + opt-in). Wire-contract unit tests pin the request shape against
the server's EnrollRequest/EnrollLabels and decode active + pending bodies.

main.rs run-mode wiring: before a managed agent connects, resolve the
operating credential by precedence — stored cak_ (steady state, no
network) -> first-run enrollment -> DEPRECATED legacy api_key (transition
only, logged at WARNING) -> error. The relay already accepts the cak_ as
the api_key query param, so the persistent transport authenticates with it
unchanged. Attended/support-code and viewer paths are untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 11:44:40 -07:00
87c6e17d4a feat(agent): cak_ at-rest credential store (SPEC-016 Phase B item 4)
Store the per-machine cak_ with BOTH layers Mike locked: DPAPI-machine
encryption (CryptProtectData with CRYPTPROTECT_LOCAL_MACHINE — a copied
blob is inert off the box) inside a SYSTEM/Administrators-only ACL'd file
at %ProgramData%\GuruConnect\credentials\agent.cak. The directory + file
ACL is hardened via icacls (/inheritance:r + grant to the well-known SIDs
*S-1-5-18 and *S-1-5-32-544, locale-independent) — auditable, with far
less unsafe FFI than building a registry-key security descriptor by hand.
Co-locates with the existing %ProgramData%\GuruConnect config/seed dir.

Provides store_cak / load_cak / clear_cak. store_cak writes atomically
(temp file + rename in the locked dir). load_cak treats a present-but-
undecryptable blob as a hard error (tamper / cross-machine copy) rather
than silently re-enrolling over it. The plaintext is never logged; the
transient plaintext copy is scrubbed after encryption. DPAPI output blobs
are LocalFree'd. Enables the Win32_Security_Cryptography windows feature.

Round-trip unit tests cover encrypt/decrypt recovery across lengths and
that a tampered blob fails to decrypt (DPAPI authenticates its blobs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 11:44:23 -07:00
6a000d012f feat(agent): extend config contract for enrollment (SPEC-016 Phase B item 2)
Add enrollment_key + site_code to EmbeddedConfig and the resolved Config
alongside the existing labels, and add department/device_type label fields
(SPEC-007 AgentStatus parity). The legacy api_key is retained but made
optional/defaulted so a SPEC-016 site installer can carry only the
enrollment credentials; existing pre-enrollment installers still parse.

The enrollment fields are #[serde(skip)] on Config so they are never
written to the on-disk TOML (install-time material only); apply_enrollment_env
layers them from GURUCONNECT_ENROLLMENT_KEY / GURUCONNECT_SITE_CODE on the
file and env load paths. The embedded path carries them from the install
blob. Config delivery itself (signed wrapper) is Phase C and unchanged here.

Add Config::https_base() deriving the REST API base (https://host[:port])
from the wss:// server_url so the enroll client and the persistent
transport share one authority.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 11:44:09 -07:00
d0b8db070f feat(agent): hardware-salt machine_uid (SPEC-016 Phase B item 1)
Extend the SPEC-004 machine_uid derivation with the locked SPEC-016
hardware salt: combine the Windows MachineGuid with the SMBIOS system
UUID (Win32_ComputerSystemProduct.UUID), falling back to motherboard
serial (Win32_BaseBoard.SerialNumber) + primary disk serial when the
SMBIOS UUID is absent or a degenerate placeholder (all-zeros / all-FFs,
emitted by some OEMs and hypervisor templates).

Signals are read via narrow PowerShell CIM queries (hidden window, no
profile) rather than adding a WMI crate or hand-rolling COM IWbemServices
for two scalar reads. Values are normalized (trim + upper-case) so vendor
case/space drift never perturbs the digest. The combined string is
SHA-256'd into the existing opaque muid_<hex> shape, preserving the wire
identity the relay connect path already reports while making it survive an
OS re-image on the same hardware. Which signal set fed the result is
logged (source label only, never the secret values).

Adds unit tests for derivation determinism + signal-sensitivity,
degenerate-SMBIOS rejection, and signal normalization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 11:43:56 -07:00
89c3718266 Merge pull request 'SPEC-016 Phase A: zero-touch enrollment backend + migration' (#5) from feat/spec-016-enrollment into main
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 10m37s
Build and Test / Build Server (Linux) (push) Successful in 15m25s
Build and Test / Security Audit (push) Successful in 5m28s
Build and Test / Build Summary (push) Successful in 23s
2026-06-02 11:19:37 -07:00
4106fc4bc4 style(enroll): cargo fmt --all (satisfy CI fmt gate)
All checks were successful
Build and Test / Build Agent (Windows) (pull_request) Successful in 16m35s
Build and Test / Build Server (Linux) (pull_request) Successful in 19m7s
Build and Test / Security Audit (pull_request) Successful in 5m27s
Build and Test / Build Summary (pull_request) Successful in 26s
The Phase A work passed cargo check + clippy + tests locally but missed
`cargo fmt --all -- --check` (the first step of the Linux CI job): module
ordering in db/mod.rs and two trailing-comment alignments in rate_limit.rs.
No logic change. Agent build failure on the prior run was transient infra
(verified: agent crate compiles clean locally).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 10:48:51 -07:00
0f02f23765 fix(enroll): SPEC-016 Phase A review fixes (cross-site guard, timing oracle, TOCTOU)
Some checks failed
Build and Test / Build Agent (Windows) (pull_request) Failing after 10m11s
Build and Test / Build Server (Linux) (pull_request) Failing after 10m5s
Build and Test / Security Audit (pull_request) Successful in 8m5s
Build and Test / Build Summary (pull_request) Has been skipped
Applies the four review fixes to POST /api/enroll, all in server/src/api/enroll.rs
(+ a new ENROLL_SITE_CONFLICT event type in server/src/db/events.rs):

1. HIGH — close the within-tenant cross-site silent-move hijack. A valid key for
   site B presented for a machine_uid already bound to a DIFFERENT site is now
   REFUSED (409 ENROLL_SITE_CONFLICT) instead of silently repointing the row and
   minting a fresh cak_. No move, no key. Emits an ENROLL_SITE_CONFLICT audit event
   + alert TODO. Same-site match still resolves to reuse; a NULL prior site_id is a
   first relational bind, not a move. The unauthenticated site_move mint path is
   removed; deliberate moves are deferred to the Phase-B --reassign flow + dashboard.

2. MEDIUM — kill the timing/enumeration oracle. Unknown site_code and no-active-key
   early rejects now pay a dummy Argon2id verify against a fixed, valid throwaway PHC
   constant (TIMING_EQUALIZER_PHC) before returning the identical 401, so every
   rejection path pays one KDF. The constant is asserted valid + verifying in tests.

3. LOW — fix the new-enroll TOCTOU. The dedup lookup + INSERT is wrapped in a bounded
   retry loop: a concurrent first-enroll of the same machine_uid whose INSERT loses
   the unique-index race (classified by is_machine_uid_conflict on SQLSTATE 23505 +
   machine_uid constraint) now re-looks-up and converges to reuse instead of 500ing.
   A non-machine_uid unique violation still surfaces as 500.

4. LOW — make the collision-gate doc honest + leave an enforcement TODO. The module
   doc now states the gate withholds only a NEWLY minted cak_ (a prior clean cak_
   survives) and that nothing consults enrollment_state at control time yet, with a
   TODO(SPEC-016 Phase B/D) marker for relay/control-plane enforcement + revocation.

Verify: cargo check, cargo clippy --all-targets, and cargo test all clean on this
Windows host (104 tests pass). Two DB-gated tests (cross-site bound-site_id exposure,
machine_uid-vs-agent_id conflict classification) no-op without TEST_DATABASE_URL and
run against real Postgres in CI; the Linux target / real-Postgres handler path is
validated there, not on this host.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 10:28:31 -07:00
59e40c8019 feat(enroll): SPEC-016 Phase A — enrollment backend + migration
Server-side zero-touch per-site enrollment (Phase A: backend + DB only;
agent-side machine_uid derivation is Phase B, server treats it as opaque).

Migration 010_spec016_enrollment.sql:
- connect_sites: relational site anchor (site_code natural key, per-tenant
  unique). The spec assumed a sites table existed; it did not (site/company
  were free-text columns on connect_machines), so this creates a minimal one.
- site_enrollment_keys: rotatable, Argon2id-hashed cek_ secret + monotonic
  version + hex fingerprint + active flag; one-active-per-site partial unique.
- connect_machines: + site_id (FK), + enrollment_state ('active'|'pending')
  collision gate, + per-tenant (tenant_id, machine_uid) unique index added
  ALONGSIDE the 008 global index (the connect-path upsert_machine ON CONFLICT
  arbiter binds to 008 — dropping it would break live reconnect).
- connect_sites.enrollment_policy: reserved (default auto-approve), not enforced.

auth/enrollment_keys.rs: cek_ mint (256-bit, OS CSPRNG), Argon2id hash/verify
(reuses auth::password), and hex fingerprint vN (XXXX) per resolved-decision #3.

db/sites.rs + db/enrollment_keys.rs: runtime sqlx persistence; rotate_key
deactivates+inserts in one tx to hold the one-active-key invariant.

POST /api/enroll (public, api/enroll.rs): site_code+cek_ verify against active
key -> dedup on (tenant, machine_uid) -> new / reuse / site-move / collision.
Collision gate (PROVISIONAL heuristic: online existing row + different hostname)
-> pending, no usable cak_, alert. Mints cak_ via existing agent_keys path in the
exact form relay::validate_agent_api_key expects. Per-(site_code,IP) rate-limit +
lockout (EnrollLimiter). Audit events + [ENROLL] alert markers with
TODO(SPEC-016) #dev-alerts notes.

Admin (JWT) api/sites.rs: POST /api/sites/:id/enrollment-key/rotate (plaintext +
fingerprint once) and GET .../enrollment-key (fingerprint/version, no secret).

Routes wired in main.rs (enroll public, rotation admin). 13 new unit tests;
full server suite 99 passing. cargo check + clippy clean on the host (Windows)
target — Linux cross-target not installed here; server crate is platform-neutral
Rust. No sqlx offline cache needed (codebase uses runtime queries, no query!).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 10:12:35 -07:00
c286a29b9d spec: SPEC-016 resolve all 5 open questions (enrollment design decisions)
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 14m25s
Build and Test / Build Server (Linux) (push) Successful in 20m31s
Build and Test / Security Audit (push) Successful in 8m28s
Build and Test / Build Summary (push) Successful in 30s
Fold the 2026-06-02 interview decisions into SPEC-016:
- Installer wrapper: ship BOTH signed .exe and signed MSI per site
- cak_ at-rest storage: DPAPI-machine-encrypted blob in a SYSTEM-ACL'd location
- Fingerprint: hex (7F2A), deliberately unlike RMM word-codes
- machine_uid: per-tenant scope + hardware-derived salt (survives re-image,
  separates distinct boxes) + collision-gated activation (template-cloned VMs
  sharing a hardware UUID drop to pending + alert, need dashboard confirm)
- Attended support-code path: unchanged (filename-based, already signing-safe)

Open Questions section -> Resolved decisions + a short Remaining-for-planning
list (exact hardware salt signal set, WiX/MSI authoring approach).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 09:54:19 -07:00
18429f6fe3 spec: add SPEC-016 zero-touch per-site agent enrollment
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 10m46s
Build and Test / Build Server (Linux) (push) Successful in 15m33s
Build and Test / Security Audit (push) Successful in 6m3s
Build and Test / Build Summary (push) Successful in 25s
ScreenConnect-class managed enrollment: one signed installer per site,
machines self-register on first run and the server mints a per-machine
cak_ key bound to a deterministic machine_uid (dedups re-installs).
Per-site rotatable enrollment key (long secret + vN (XXXX) fingerprint);
rotating blocks new enrollments from old installers, leaves enrolled
agents untouched. Auto-approve + new-enrollment/site-move alert.

Resolves SPEC-007's signature-vs-appended-config open question:
sign the base agent once in CI + per-site signed wrapper that writes
site config around the signed bytes (never appended into the PE).

Deferred (room reserved): enrollment policy + per-seat licensing,
--enroll-key/--site-code/--reassign flag overrides, technician-assisted
interactive install. Tracking todo dbfe6a56.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 09:13:59 -07:00
3b9e4068c9 docs(roadmap): mark release signing shipped; add signed beta channel as P1-NOW
All checks were successful
Build and Test / Build Server (Linux) (push) Successful in 14m11s
Build and Test / Build Agent (Windows) (push) Successful in 8m3s
Build and Test / Security Audit (push) Successful in 5m38s
Build and Test / Build Summary (push) Successful in 17s
Release-path Azure Trusted Signing and auto-versioning were already
shipped with v0.3.0 (stale [ ] -> [x]). Add a new P1/NOW item for a
signed beta/test release channel: the auto build-and-test.yml agent
artifact is unsigned, so testers can receive unsigned binaries. The
beta channel (now implemented in release.yml) closes that gap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 07:57:04 -07:00
87f229509b ci(release): add signed beta/test release channel
Some checks failed
Build and Test / Build Server (Linux) (push) Has started running
Build and Test / Build Agent (Windows) (push) Has started running
Build and Test / Security Audit (push) Has been cancelled
Build and Test / Build Summary (push) Has been cancelled
Add a `channel: stable | beta` workflow_dispatch input to release.yml.
`stable` is unchanged (byte-for-byte). `beta` produces a Windows agent
binary signed by the identical fail-closed Azure Trusted Signing path,
but skips the semver bump, changelog, and release commit, and publishes
a prerelease-tagged Gitea release (vX.Y.Z-beta.<run_number>) at HEAD.

So every binary handed to a tester is signed, not just formal releases.

- prerelease tags excluded from stable LAST_TAG detection (both lookups)
  so a beta tag can't corrupt the next stable version computation
- beta tag force-created/pushed -> idempotent on failed-run re-runs
- changelog download gated to stable; release prerelease flag plumbed
  through to the Gitea REST payload

Reviewed-by: Code Review Agent (APPROVE WITH NITS; N1 hardened)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 07:56:17 -07:00
40c7d860cc spec(v2-session-core): add Task 9 — cak_ auto-enroll provisioning (TOFU) + shared-key retirement
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 7m10s
Build and Test / Build Server (Linux) (push) Successful in 10m31s
Build and Test / Security Audit (push) Successful in 4m1s
Build and Test / Build Summary (push) Successful in 9s
2026-06-01 14:40:14 -07:00
0059b21db6 fix(server): revert migration 008 comment edit — modifying an applied sqlx migration breaks its checksum and crash-loops the server on startup; machines.rs ON CONFLICT fix retained
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 7m33s
Build and Test / Build Server (Linux) (push) Successful in 11m57s
Build and Test / Security Audit (push) Successful in 4m33s
Build and Test / Build Summary (push) Successful in 11s
2026-06-01 10:05:38 -07:00
f950511e3e fix(server): bind machine_uid upsert ON CONFLICT to the partial index (WHERE machine_uid IS NOT NULL)
Some checks failed
Build and Test / Build Agent (Windows) (push) Successful in 8m16s
Build and Test / Build Server (Linux) (push) Successful in 11m58s
Build and Test / Security Audit (push) Has started running
Build and Test / Build Summary (push) Has been cancelled
Bare ON CONFLICT (machine_uid) could not bind to migration 008's partial unique index, so no connect_machines row was persisted for any agent reporting a machine_uid. Confirmed live on 172.16.3.30 with a signed 0.3.0 test agent.
2026-06-01 09:50:34 -07:00
16017456aa docs: 2026-05-31 security re-audit (Phase-1 EXIT) + roadmap reconcile
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 6m59s
Build and Test / Build Server (Linux) (push) Successful in 10m35s
Build and Test / Security Audit (push) Successful in 4m3s
Build and Test / Build Summary (push) Successful in 7s
/gc-audit --pass=security re-pass over the deployed v0.3.0 code: PASS,
0 CRITICAL/HIGH/MEDIUM/LOW. The 3 relay CRITICALs stay closed (verified in
code AND live against the deployed binary), the prior agent-update-TLS HIGH
and chat-logging LOW are fixed, and the net-new SPEC-004 surface (machine_uid
dedup gate, session reaper/supersede, operator removal API) audits clean —
no non-admin removal path, no uid-spoof hijack, no auth-plane crossover.

Marks v2 Phase 1 formally exited (secure-session-core Task 8 complete).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 18:19:09 -07:00
guruconnect-ci
e967cce1a1 chore: release v0.3.0 [skip ci] v0.3.0 2026-06-01 00:10:58 +00:00
16586c4a1b chore: reconcile manifest versions to v0.2.2 baseline
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 7m14s
Build and Test / Build Server (Linux) (push) Successful in 11m25s
Build and Test / Security Audit (push) Successful in 7m13s
Build and Test / Build Summary (push) Successful in 1m2s
agent + server Cargo.toml hardcoded 0.2.0 (below the workspace.package
0.2.2 and the last release tag v0.2.2); dashboard was on a divergent
2.0.0 scheme. Align all component manifests + the dashboard lockfile to
the v0.2.2 baseline so the next release bumps them coherently to 0.3.0
rather than decreasing the dashboard. No code change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 16:50:59 -07:00
96f9c0ab45 feat(dashboard): operator removal UI for stale machines/sessions (SPEC-004 Task 5)
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 7m13s
Build and Test / Build Server (Linux) (push) Successful in 11m21s
Build and Test / Security Audit (push) Successful in 4m12s
Build and Test / Build Summary (push) Successful in 11s
Admin-only per-row Remove + multi-select bulk removal on the machines view, plus
per-row purge Remove on the sessions view, wired to the Task-5 admin API
(DELETE /api/machines|sessions/:id?purge=true, POST /api/machines/bulk-remove).
Confirm modals (danger-styled, focus-trapped), TanStack refetch so purged rows
leave the console, structured ApiError surfacing, honest partial-bulk summary,
and admin-gating via useAuth().isAdmin as defense-in-depth over the server 403.
Replaces the legacy all-user delete trigger. typecheck/lint/build clean.

Implements specs/v2-stable-identity/plan.md Task 5 (dashboard portion).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 14:14:49 -07:00
5ee6675337 feat(server): operator removal of stale sessions/machines (SPEC-004 Task 5, server)
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 7m29s
Build and Test / Build Server (Linux) (push) Successful in 10m58s
Build and Test / Security Audit (push) Successful in 4m4s
Build and Test / Build Summary (push) Successful in 8s
Admin-gated soft-delete + purge so operators can clear ghost machines/sessions
(the ~15-rows-for-one-host accumulation) from the console.

- migration 009: deleted_at on connect_sessions + connect_machines, with partial
  indexes WHERE deleted_at IS NULL.
- DELETE /api/machines/:agent_id?purge=true and DELETE /api/sessions/:id?purge=true
  soft-delete the row and purge the in-memory session (remove_session); the
  non-purge path keeps the legacy hard-delete / live-only disconnect. POST
  /api/machines/bulk-remove handles multi-select (batch cap 500). All admin-gated
  (AdminUser -> 403; tightens the prior any-user delete) and audited to
  connect_session_events (actor + target + trusted client IP).
- list/get queries filter deleted_at IS NULL so removed units leave the console;
  upsert revives (deleted_at = NULL) a genuinely-reconnecting machine. The
  keyed-reattach identity resolver (get_machine_by_id) is intentionally unfiltered.

Dashboard removal UI is the A3b follow-up. 86 server tests pass; fmt/clippy/test
clean. Implements specs/v2-stable-identity/plan.md Task 5 (server portion).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 13:52:36 -07:00
cef1928379 style(server): cargo fmt for SPEC-004 Task 2 + Task 4
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 6m40s
Build and Test / Build Server (Linux) (push) Successful in 10m18s
Build and Test / Security Audit (push) Successful in 4m12s
Build and Test / Build Summary (push) Successful in 12s
Pure rustfmt reflow of the Task 2 (machine_uid dedup) and Task 4 (session
reaping) code; no logic change. The CI Build-Server-Linux job gates on
cargo fmt --check, which the two feature commits failed because local
validation ran check/clippy/test but not fmt --check. fmt --check, check,
and clippy -D warnings all clean now.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 12:27:01 -07:00
4e80573cbd feat(server): reap stale persistent sessions + same-machine supersede (SPEC-004 Task 4)
Some checks failed
Build and Test / Build Server (Linux) (push) Failing after 3m32s
Build and Test / Build Agent (Windows) (push) Has started running
Build and Test / Security Audit (push) Has started running
Build and Test / Build Summary (push) Has been cancelled
A periodic reaper removes persistent, offline, viewerless sessions whose last
heartbeat is older than a 10-minute TTL (60s sweep spawned at startup), and a
same-machine supersede on the new-session path drops a stranded prior session
when a legacy no-uid agent upgrades to a fresh agent_id + machine_uid. Both
removals re-assert the predicate under the write lock (remove_session_if) to
close a snapshot->remove TOCTOU.

Security: keyed (cak_) agents pass machine_uid=None, so they never trigger
supersede and are never reaped as a uid victim; online, viewer-attached, and
support sessions are never reaped. 82 server tests pass; clippy clean.

Implements specs/v2-stable-identity/plan.md Task 4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 12:21:15 -07:00
ffca7f0cee feat(server): dedup machines on machine_uid (SPEC-004 Task 2)
Some checks failed
Build and Test / Build Server (Linux) (push) Failing after 3m48s
Build and Test / Build Agent (Windows) (push) Successful in 7m34s
Build and Test / Security Audit (push) Successful in 4m44s
Build and Test / Build Summary (push) Has been skipped
Persist the agent-reported machine_uid and dedup connect_machines on it so a
single physical machine can't register duplicate rows when its config-file
agent_id regenerates (the ghost-session root cause).

- migration 008: nullable connect_machines.machine_uid + partial unique index
  (WHERE machine_uid IS NOT NULL); idempotent, startup-applied.
- upsert_machine: two-path dedup (ON CONFLICT machine_uid when present, else
  the legacy ON CONFLICT agent_id path, unchanged).
- session reattach: a machine_uid index consulted before agent_id, with all
  removal paths purging it.
- security: keyed (cak_) agents stay authoritative — their claimed machine_uid
  is dropped (effective_machine_uid=None); uid is dedup-only for un-keyed /
  support-code agents. Startup restore skips uid-indexing keyed machines and
  fails closed if the keyed-set query errors.

74 server tests pass; clippy clean. Implements specs/v2-stable-identity/plan.md Task 2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 12:06:50 -07:00
97780304e7 fix(agent): make native H.264 viewer render live frames
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 7m2s
Build and Test / Build Server (Linux) (push) Successful in 10m24s
Build and Test / Security Audit (push) Successful in 4m15s
Build and Test / Build Summary (push) Successful in 9s
The native viewer's H.264 path (Task 7 first-cut, compile-verified only)
never rendered a frame. Three stacked bugs, all confirmed via live loopback:

1. decoder: MF_E_NOTACCEPTING (0xC00D36B5) was treated as fatal and only
   one output was drained per call, so once the MFT filled it rejected
   every subsequent frame. decode() now returns Vec<DecodedFrame>, drains
   on back-pressure and retries the unconsumed sample, then drains all
   ready outputs.
2. decoder: the NV12 output type was hand-built and rejected by the MS
   H.264 decoder MFT (MF_E_TRANSFORM_TYPE_NOT_SET, 0xC00D6D60). It is now
   negotiated by enumerating GetOutputAvailableType on STREAM_CHANGE /
   TYPE_NOT_SET.
3. render: a manual pump_messages() in about_to_wait stole winit's own
   thread messages and froze the event loop after one iteration, so frames
   were never drained from the channel. Removed; winit's run_app pump
   already services the WH_KEYBOARD_LL hook.

Validated on a 5070 loopback: 0 decode errors, frames decode/paint/present
(present count 0 -> 1740). Reviewed (APPROVE-WITH-NITS); diagnostics stripped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 11:25:05 -07:00
afbf0d81b8 spec: add SPEC-015 Configurable Notification Overlay
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 8m0s
Build and Test / Build Server (Linux) (push) Successful in 11m26s
Build and Test / Security Audit (push) Successful in 4m37s
Build and Test / Build Summary (push) Successful in 12s
Comprehensive specification for on-screen notification when technician connects.

- Semi-transparent topmost window with configurable message, position, duration
- Dashboard admin settings page (enable/disable, message template, position, duration)
- Template variables: {{technician_name}}, {{company}}, {{time}}
- Agent displays overlay on StartStream, auto-hides after duration or manual dismiss
- Database: notification_config singleton table
- Protobuf: NotificationConfig message in StartStream
- Priority: P2, Effort: Medium (3-4 weeks)
- Added to roadmap under Core Remote Control

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-05-31 08:40:53 -07:00
b45c683a51 spec: add SPEC-014 Branding and White-Label Configuration
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 8m16s
Build and Test / Build Server (Linux) (push) Successful in 11m48s
Build and Test / Security Audit (push) Successful in 4m35s
Build and Test / Build Summary (push) Successful in 13s
Comprehensive specification for branding/whitelabel configuration.

- Dashboard admin settings page (logo, brand hue, product name, company name, favicon)
- OKLCH color system with CSS variables for dynamic theming
- Agent tray tooltip customization via registry key
- Singleton database table with public GET endpoint
- Priority: P2, Effort: Medium (4-6 weeks)
- Added to roadmap under Server/API (v2 Phase 2)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-05-31 08:12:37 -07:00
5637e4c1f9 spec: add SPEC-013 Windows Session Selection and Backstage Mode
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 8m5s
Build and Test / Build Server (Linux) (push) Successful in 11m24s
Build and Test / Security Audit (push) Successful in 4m30s
Build and Test / Build Summary (push) Successful in 12s
2026-05-31 07:54:25 -07:00
b3e8f32734 feat(agent): derive + report deterministic machine_uid (SPEC-004 Task 1)
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 7m4s
Build and Test / Build Server (Linux) (push) Successful in 9m41s
Build and Test / Security Audit (push) Successful in 4m11s
Build and Test / Build Summary (push) Successful in 10s
Agent now derives a recomputable, opaque machine_uid (Windows: SHA-256 of the OS
MachineGuid at HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid -> muid_<hex>;
non-Windows / registry-failure: persisted random UUID, warn-logged). Raw GUID
never exposed; OnceLock-cached. Reported ALONGSIDE agent_id (unchanged) on
AgentStatus (new additive proto field 12) and in the connect handshake query.
This is the stable identity that fixes config-loss duplicate registrations
(DESKTOP-I66IM5Q x9); server-side dedup keying that consumes it is SPEC-004
Task 2. Non-breaking, isolated. 5 unit tests; cargo fmt/clippy(-D warnings)/test
green on GURU-5070.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 21:23:11 -07:00
92bc522c3a spec: add v2-stable-identity implementation plan (SPEC-004 breakdown)
Some checks failed
Build and Test / Build Server (Linux) (push) Has started running
Build and Test / Build Agent (Windows) (push) Has started running
Build and Test / Security Audit (push) Has been cancelled
Build and Test / Build Summary (push) Has been cancelled
Ordered, execution-ready plan for SPEC-004 (stable machine identity + session
reaping + operator removal). Works out the core integration: machine_uid =
deterministic MachineGuid-based hardware identity (recomputable, so config loss
can't duplicate); per-agent cak_ key stays the credential/trust boundary; they
compose so one cak_ key per machine_uid = one key per real machine (the
prerequisite the fleet key-migration #7 needs). Root cause grounded in code:
agent_id is a random UUID (config.rs:90), connect_machines dedups on ON CONFLICT
(agent_id), so config loss -> duplicate rows (DESKTOP-I66IM5Q x9 live). 5 ordered
tasks (agent uid -> server dedup -> reconcile/age-out -> reaping -> operator
removal). Unblocks #7 -> #5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 21:17:49 -07:00
df51d40094 feat(server): per-agent H.264 test override (h264-test tag) [Task 8 prep]
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 7m32s
Build and Test / Build Server (Linux) (push) Successful in 10m55s
Build and Test / Security Audit (push) Successful in 4m14s
Build and Test / Build Summary (push) Successful in 11s
Lets the HW-H.264 path be live-validated on tagged test agents without affecting
the live client fleet. Adds H264_TEST_TAG="h264-test" + a pure prefer_h264_for(tags)
helper (DEFAULT_PREFER_H264 || tags contains the tag, case-insensitive); StartStream
codec negotiation now computes prefer_h264 from the agent's reported tags instead of
the bare const, and logs the computed value. SAFETY: untagged sessions are byte-for-
byte unchanged (prefer_h264 == DEFAULT_PREFER_H264 == false -> raw); the supports_h264
guard still forces raw for a no-HW agent even when tagged. DEFAULT_PREFER_H264 stays
false (flipping the global default is a separate future step). 3 unit tests added.
cargo fmt/clippy(-D warnings)/test green on GURU-5070 (37 agent + 64 server).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 20:17:38 -07:00
7be8f454e0 Merge remote security fixes with local specs
All checks were successful
Build and Test / Build Server (Linux) (push) Successful in 9m56s
Build and Test / Build Agent (Windows) (push) Successful in 6m21s
Build and Test / Security Audit (push) Successful in 4m21s
Build and Test / Build Summary (push) Successful in 9s
2026-05-30 19:21:42 -07:00
c98692e424 fix(server): revoke viewer tokens on logout + stop logging chat content
Some checks failed
Build and Test / Build Server (Linux) (push) Has started running
Build and Test / Build Agent (Windows) (push) Has started running
Build and Test / Security Audit (push) Has been cancelled
Build and Test / Build Summary (push) Has been cancelled
Security follow-ups (audit 2026-05-30, both reviewed APPROVE):
- MEDIUM: viewer tokens were never blacklisted on logout, so a minted
  session-scoped viewer token stayed valid up to its 5-min TTL after the user
  logged out. Add a per-user ViewerTokenRegistry (Arc<Mutex<HashMap<sub,
  Vec<(token, expires_at)>>>>, prune-on-insert) on AppState; mint_viewer_token
  registers each token under the user sub; logout drains take_for_user(sub) and
  blacklists each via the existing token_blacklist. The viewer WS already calls
  is_revoked, so no WS change. Key chain user.user_id == ViewerClaims.sub ==
  registry key verified consistent. 8 new tests.
- LOW: relay chat logs now emit content length, not the chat body (support-chat
  can carry secrets/PII).
cargo fmt/clippy(-D warnings)/test green on GURU-5070 (37 agent + 61 server).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:20:15 -07:00
761bae5d01 spec: update SPEC-012 to include both Serial Console + PTY Shell modes
Major update to SPEC-012 adding dual-mode terminal access:

Mode 1: Serial Console Mode (True Remote Console)
- Direct access to system serial console (/dev/ttyS0 or /dev/console)
- Sees GRUB bootloader, kernel boot messages, login prompts, kernel panics
- Boot-time interaction: select GRUB entries, edit kernel parameters, single-user mode
- Requires root privileges or CAP_SYS_TTY_CONFIG capability
- Setup: GRUB + kernel parameters configured for serial console output
- Like KVM-over-IP or IPMI Serial-over-LAN (text-mode equivalent)

Mode 2: PTY Shell Mode (Interactive Shell)
- Spawn pseudo-TTY with bash/zsh shell session
- Normal server management (package updates, log review, etc.)
- Runs as unprivileged agent service user
- Standard interactive shell with full ANSI/VT100 support

Architecture:
- Agent mode selection based on viewer request (console vs. shell)
- Dashboard shows two buttons: "Console" and "Shell" for headless agents
- Same xterm.js viewer handles both modes transparently
- Protobuf extensions: TerminalModeRequest enum, console_mode flag

Security:
- Console mode requires root (boot-level control risk)
- Recommend RBAC: separate console_access and shell_access permissions
- Console sessions should require MFA (Phase 2)
- Audit logging for both modes

Setup Requirements:
- One-time GRUB configuration for serial console
- systemd service with CAP_SYS_TTY_CONFIG for console mode
- serial-getty@ttyS0.service enabled for login prompt

Updated effort: Medium (5-7 weeks, up from 4-6)
Priority remains P2

Addresses user request for "remote console" (as if at the machine)
not just shell access.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-05-30 19:02:27 -07:00
8119292bcd fix(agent): close auto-update TLS bypass (MITM -> RCE) [HIGH]
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 7m24s
Build and Test / Build Server (Linux) (push) Successful in 10m41s
Build and Test / Security Audit (push) Successful in 4m19s
Build and Test / Build Summary (push) Successful in 9s
The auto-update path built both reqwest clients with an unconditional
danger_accept_invalid_certs(true), so a network MITM could serve an arbitrary
update .exe (checksum is no defense — same unverified channel) and gain RCE on
every managed endpoint. Replace with dev_insecure_tls() = cfg!(debug_assertions)
&& env GURUCONNECT_DEV_INSECURE_TLS: the cfg gate compiles out of release builds,
so a shipped agent ALWAYS verifies certs; dev keeps a self-signed escape hatch.
Loud warn when the insecure path is taken; verify_checksum kept + documented as
transport-integrity (not tamper) defense; TODO + follow-up for embedded-key
update signing (defense-in-depth). Release-invariant unit test added.
cargo fmt/clippy(-D warnings)/test green on GURU-5070 (90 tests). Closes the
2026-05-30 security-audit HIGH (reports/2026-05-30-gc-audit.md).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:02:23 -07:00
9f44807230 audit: security pass re-audit (2026-05-30) — 3 CRITICALs verified CLOSED
Some checks failed
Build and Test / Build Agent (Windows) (push) Successful in 7m1s
Build and Test / Build Server (Linux) (push) Successful in 10m17s
Build and Test / Security Audit (push) Has started running
Build and Test / Build Summary (push) Has been cancelled
Independent /gc-audit --pass=security re-derivation of the v2 secure-session-core
rebuild: all three 2026-05-29 relay CRITICALs confirmed closed with no bypass
(any-JWT-joins-session, viewer-WS blacklist, JWT-as-agent-key). Relay plane clean;
consent/code paths fail closed; abuse surface bounded; rate limiting proxy-aware.
Net-new: 1 HIGH (agent auto-update disables TLS cert verification -> MITM-RCE,
agent/src/update.rs:45,111 — outside the relay plane), 1 LOW (chat content logged),
2 INFO. Report: reports/2026-05-30-gc-audit.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 18:48:48 -07:00
a062a825ea spec: add SPEC-012 Headless Linux Mode (Direct TTY Access)
Comprehensive specification for terminal-based remote access to headless
Linux servers (no X11/Wayland GUI):

Core Capabilities:
- PTY spawn via openpty() + fork/exec shell (/bin/bash or $SHELL)
- Terminal I/O: PTY output → TerminalData protobuf → WebSocket relay
- Input: keyboard → TerminalInput protobuf → PTY master write
- Resize: SIGWINCH on terminal window resize, TIOCSWINSZ ioctl
- Auto-detection: agent detects headless environment (no DISPLAY) at runtime

Viewer:
- xterm.js-based web terminal (80x24 default, resizable)
- Full ANSI/VT100 support (colors, cursor control, vim/nano/htop)
- Same protobuf-over-WSS protocol, support-code/agent-key auth
- Dashboard shows "Terminal" badge, routes to terminal viewer

Use Cases:
- Server management (headless Ubuntu Server, VMs, containers)
- Emergency recovery (systemd rescue mode, single-user mode)
- Container debugging (exec into running containers)
- SSH replacement with centralized audit logging

Protobuf Extensions:
- TerminalData, TerminalInput, TerminalResize messages
- AgentStatus.terminal_mode flag

Security:
- Run agent as unprivileged user + sudo for privileged commands
- Session recording to terminal_recordings table (asciicast format)
- Same auth model as GUI agents (support-code / per-agent key)

Estimated effort: Medium (4-6 weeks)
Priority: P2 (server management is market-critical)

Extends SPEC-010 Linux agent with PTY alternative to screen capture.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-05-30 18:28:34 -07:00
b1862800a1 spec: add SPEC-011 Mobile Agent Support (iOS and Android)
Comprehensive specification for iOS/Android devices as remote control targets:

iOS Agent (View-Only):
- ReplayKit 2 screen capture (user consent required)
- VideoToolbox H.264 encoding
- NO input injection (iOS sandboxing limitation)
- APNs push notifications for session requests
- Foreground-only operation (OS requirement)

Android Agent (View + Control):
- MediaProjection API screen capture (user consent)
- MediaCodec H.264 encoding
- Accessibility Service for input injection (tap/swipe/type)
- FCM push notifications
- Foreground service with persistent notification

Architecture:
- Native Swift/SwiftUI (iOS) and Kotlin/Jetpack Compose (Android) apps
- Same protobuf-over-WSS protocol as desktop agents
- Support-code authentication (persistent mode deferred to Phase 2)
- Minor protobuf additions: MobileCapabilities, TouchEvent
- Server push module: APNs (a2 crate) + FCM HTTP v1

Key constraints:
- Attended-only sessions (user must grant permission)
- Foreground-only (cannot capture in background on either platform)
- iOS view-only (platform sandbox prevents input injection)
- Consent-first model (MediaProjection/ReplayKit user prompts)

Estimated effort: X-Large (16-20 weeks, requires mobile expertise)
Priority: P3

Distinct from GuruRMM SPEC-017 (MDM/inventory) — this is remote
control, not device management.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-05-30 18:24:16 -07:00
442eecefc0 fix(server,agent): apply Tasks 3-5 review fixes (non-blocking)
All checks were successful
Build and Test / Build Agent (Windows) (push) Successful in 7m6s
Build and Test / Build Server (Linux) (push) Successful in 10m39s
Build and Test / Security Audit (push) Successful in 4m14s
Build and Test / Build Summary (push) Successful in 8s
From the secure-session-core Tasks 3-5 code review (APPROVE-WITH-FIXES):
- MEDIUM-2: delete the dead `validate_agent_key` "accept-any-key" placeholder +
  its AuthenticatedAgent/AuthState scaffolding (zero callers; the real agent
  auth is validate_agent_api_key + per-agent cak_ keys). Removes an auth landmine.
- LOW-3: stop interpolating support-code values into 3 relay log lines (bearer
  credentials).
- LOW-1: document the X-Real-IP trust requirement in ip_extract.rs (NPM must set
  it from $remote_addr); behavior unchanged.
- LOW-2: correct the consent/heartbeat comment in agent session loop (the loop
  awaits the dialog; safe because CONSENT_TIMEOUT 60s < HEARTBEAT_TIMEOUT 90s).
cargo fmt/clippy(-D warnings)/test all green on GURU-5070 (89 tests, 0 warnings).
MEDIUM-1 (viewer-token logout revocation) remains a tracked follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 18:23:03 -07:00
5e2325507f spec: add SPEC-010 Cross-Platform Agent Support (macOS and Linux)
Comprehensive specification for expanding agent support beyond Windows:

macOS Agent (Priority 1):
- ScreenCaptureKit API (macOS 13+) with AVFoundation fallback
- CGEvent input injection
- VideoToolbox H.264 encoding
- NSStatusItem menu bar icon
- Universal binary (x86_64 + arm64)
- Code signing and notarization

Linux Agent (Priority 2):
- X11 XShm screen capture with Wayland detection
- XTest input injection
- VA-API hardware H.264 encoding with software fallback
- StatusNotifier system tray
- .deb and .rpm packaging

Architecture:
- Platform abstraction layer (traits for capture/input/encoder/tray)
- Refactor existing Windows code behind PlatformCapture/Input/Encoder
- No protobuf protocol changes
- Same authentication (support codes and agent keys)

Estimated effort: X-Large (12-16 weeks)
Priority: P2 (market-critical for multi-platform MSP adoption)

Updated roadmap: promoted from P3 to P2 with full spec link.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-05-30 18:15:16 -07:00
c736a710a1 docs: record Tasks 3-5 code review (APPROVE-WITH-FIXES) in plan status
Some checks failed
Build and Test / Build Server (Linux) (push) Failing after 3m43s
Build and Test / Build Agent (Windows) (push) Successful in 7m43s
Build and Test / Security Audit (push) Successful in 4m57s
Build and Test / Build Summary (push) Has been skipped
Formal review on GURU-5070: cargo fmt/clippy/test green (89 tests, 0 warnings);
the 3 audit CRITICALs verified closed with no bypass; all security paths fail
closed. Non-blocking follow-ups tracked (viewer-token logout revocation, delete
dead validate_agent_key placeholder, X-Real-IP/log hygiene). Remaining for
Phase-1 exit: Task 8 e2e verification + /gc-audit security re-audit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 18:14:02 -07:00
786d3e47af docs: correct roadmap — v2 Phase 1 already landed, not a future sprint
Some checks failed
Build and Test / Build Server (Linux) (push) Failing after 3m12s
Build and Test / Security Audit (push) Successful in 4m53s
Build and Test / Build Agent (Windows) (push) Successful in 7m14s
Build and Test / Build Summary (push) Has been skipped
Re-baseline against actual git/deploy state: secure-session-core Tasks 1-7 are
committed and DEPLOYED; the 3 audit CRITICALs are closed and live in prod
(verified: deployed checkout abc55ab descends from the CRITICAL#1 fix + Task 7;
guruconnect.service running on :3002). The prior "Sprint 0: bypasses are live"
banner was wrong (stale 2026-05-29 audit narrative) and is removed. Remaining
to exit Phase 1 = secure-session-core Task 8 (e2e verification + security
re-audit) + Code-Review sign-off on Tasks 3-5. Schema note corrected
(connect_agent_keys + tenancy already exist via migration 004).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 17:36:18 -07:00
03f62d413f docs: annotate roadmap with v2-first direction + phase mapping
Some checks failed
Build and Test / Build Server (Linux) (push) Failing after 4m54s
Build and Test / Build Agent (Windows) (push) Has started running
Build and Test / Security Audit (push) Has started running
Build and Test / Build Summary (push) Has been cancelled
Mark SPEC-003..009 as work-items inside the SPEC-002 v2 phases (not standalone
v1 backlog): banner records the v2-reset decision + the Sprint-0 relay-auth
CRITICAL hotfix, a phase-mapping table (004->P1, 008->P0/1, 003/005/006/007->P2,
009->P3), inline [-> v2 Phase N] tags per spec, and a note to bake SPEC-003
inventory cols + SPEC-004 machine_uid + connect_agent_keys into the Phase-0
fresh schema. Sprint planning 2026-05-30 (Mike: v2 reset first).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 17:26:47 -07:00
7ab87384a7 spec: add SPEC-009 feature-rich documented API
Some checks failed
Build and Test / Build Server (Linux) (push) Failing after 3m42s
Build and Test / Build Agent (Windows) (push) Successful in 7m39s
Build and Test / Security Audit (push) Successful in 4m34s
Build and Test / Build Summary (push) Has been skipped
Everything the console does should be callable by API, documented and
discoverable. Adds: OpenAPI 3.x generated from code (utoipa) + Swagger/Redoc at
/api/docs (drift-proof, route<->spec parity test); long-lived revocable scoped
API tokens (connect_api_tokens, hashed like agent keys) distinct from the 24h
dashboard JWT and agent keys; an API-completeness gap audit (folds in SPEC-004/
006/007 endpoints); consistent pagination/filtering + versioning policy. Today
there is zero API doc tooling and no programmatic token. Depends on SPEC-008 for
the documented error envelope; distinct from the ADR-001 integration contract.
Large. Parallel guru-rmm SPEC-019. Requested by Mike 2026-05-30.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 16:35:57 -07:00
65eff5cf50 spec: add SPEC-008 valuable error messages
Cross-cutting error-quality initiative: one structured AppError envelope
(stable error_code + message + correlation_id) replacing the current ad-hoc
mix (bare (StatusCode,&str) tuples, per-file ErrorResponse, two JSON envelopes
the dashboard already unions); correlation-id middleware tied to tracing spans
+ response header so a reported id greps the log; contextual error logging with
identifiers + error chain; sweep the 37 server `let _ =` swallows (the pattern
that silently hid migration-005's missing columns); dashboard renders the real
cause + correlation id (drop the hardcoded generic at MachinesPage.tsx:202);
agent logs why/where auth/connection failed (the auth-loop incident gave no
local signal). Phaseable; Large. Parallel RMM request keeps conventions aligned.
Requested by Mike 2026-05-30.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 16:30:07 -07:00
008d2bf30b spec: add SPEC-007 managed-agent installer builder
Dashboard "Build Installer" wizard for pre-labeled managed/persistent agents
(Name/Company/Site/Department/Device Type/Tag/Type) with Download / Copy URL /
Send Link, ScreenConnect-style. The embed-config build path already exists
(downloads.rs appends EmbeddedConfig GURUCONFIG blob; AgentDownloadParams takes
company/site/tags/api_key; agent reads it at config.rs:223) - missing is the UI,
department + device_type fields (EmbeddedConfig/AgentStatus/connect_machines),
name strategy, and Copy-URL/Send-Link actions. Labels persist at install time,
feeding SPEC-003/005/006. Embedded key should be revocable per-machine/site
(pairs with SPEC-004). Biggest open question: appending config after Authenticode
signing invalidates the signature. Requested by Mike 2026-05-30.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 16:24:56 -07:00
0eb38520ed spec: add SPEC-006 universal machine search
Single search box matching case-insensitive substring across ALL machine
attributes (OS, logged-on user, external/private IP, company, site, tag,
serial, MAC, client version, ...) server-side, ScreenConnect-style. Replaces
the dashboard's hostname/agent_id-only client filter (inadequate at ~900+
machines). pg_trgm GIN index over a concatenated searchable-text expression
(INET cast to text, tags via array_to_string); multi-term AND; optional
field-scoped syntax (os:/user:/ip:). Parameterized + fixed column allowlist
(no injection), admin-guarded, DoS-capped. Depends on SPEC-003 (attrs must be
persisted to be searchable); reuses SPEC-005 enriched payload. Requested by
Mike 2026-05-30.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 16:21:10 -07:00
cdc182f0fb spec: add SPEC-005 machines list view (dual indicators + rich rows)
ScreenConnect "Access"-list parity for the Operator Console machines list:
per-row dual Host/Guest connection indicators (Guest=agent is_online,
Host=viewer_count>0 with viewer names + durations) and rich inline metadata
(company, site, device type, tags, logged-on user + idle, client version in
red when outdated). Live Host/Guest state already exists on SessionInfo
(is_online, viewer_count, viewers); main work is enriching /api/machines with
that + SPEC-003 inventory and redesigning MachinesPage rows. Depends on
SPEC-003 (data), reads cleanest after SPEC-004 (dedup), dovetails SPEC-002
Phase 2. Company-tree nav split out as a P3 follow-up. Requested by Mike
2026-05-30.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 16:17:48 -07:00
f8bd4d1dab spec: SPEC-004 add stable machine-derived identity as the primary fix
Address duplicate registration at the source, not just via cleanup. Root
cause now grounded: agent_id is a random UUID (config.rs:90 generate_agent_id)
persisted only in the config file, so a portable/misconfigured execution
(the Pavon desktop launcher) regenerates a fresh id each launch, defeating
both the DB upsert (ON CONFLICT agent_id) and session-reuse dedupe. Add a
deterministic machine_uid (Windows MachineGuid-based, recomputable) keyed by
registration; reaping/supersede become defense-in-depth. Security: machine_uid
is identity not authorization and must be bound to the per-machine agent key
to prevent session/record hijack. Requested by Mike 2026-05-30.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 16:11:38 -07:00
ee900c6395 spec: add SPEC-004 session lifecycle reaping + operator removal
Stop orphaned managed sessions accumulating in the Operator Console and let
admins remove stale sessions/units individually and in bulk. Root cause
confirmed in code: the Sessions list is the in-memory SessionManager;
register_agent reconnect-reuse keys on a stable agent_id (session/mod.rs:169)
and persistent sessions are never reaped on disconnect (session/mod.rs:519-542),
so an agent reconnecting with a fresh agent_id leaves a new retained ghost
session each time (observed: 15 sessions/0 live, ~10 orphans for one machine
after a GuruConnect-client reconnect storm). Adds TTL sweep + same-machine
supersede, admin-gated audited purge + bulk endpoints, and dashboard
multi-select removal. Requested by Mike 2026-05-30.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 16:05:32 -07:00