Run the managed/persistent GuruConnect agent as a LocalSystem Windows
service so it is reachable at the login screen and across reboots, and
so the SPEC-016 per-machine cak_ store (ACL-restricted to SYSTEM +
Administrators) is finally readable in-context.
Phase 1 scope (host + lifecycle only):
- New agent/src/service/mod.rs: registers "GuruConnectAgent" with the
SCM via the windows-service dispatcher, reports a correct lifecycle
(StartPending -> Running -> StopPending -> Stopped), handles
Stop/Shutdown via an AtomicBool the agent loop polls (graceful WS
close), and provides install/uninstall/start (LocalSystem, AutoStart,
sc-failure crash recovery). Idempotent install/uninstall.
- main.rs: hidden `service-run` subcommand routes the SCM-launched
process into the dispatcher; new run_managed_agent_service() runs the
existing RunMode::PermanentAgent logic (resolve/enroll cak_, hold the
relay) as SYSTEM. run_agent() now takes an optional SCM shutdown flag,
skips the HKCU Run autostart and the tray when run as the service, and
interrupts the reconnect backoff promptly on stop. An interactive
launch of a managed binary now installs+starts the service and exits
instead of double-running.
- install.rs: a managed install (embedded config present) installs the
LocalSystem service as the single autostart and removes the legacy
HKCU Run entry; uninstall stops+deletes the service (idempotent).
Attended/viewer installs are untouched.
- Kept the SPEC-016 Phase B fail-fast guard as a harmless safety net for
any non-SYSTEM invocation; updated its comment to name this service as
the managed run context.
Phase 2 NOT built (seams documented): session broker, per-session
capture/input worker, CreateProcessAsUserW token handoff, service/worker
IPC, and SERVICE_CONTROL_SESSIONCHANGE. Phase 1 enrolls/connects as
SYSTEM but does not capture a desktop (a Session-0 process cannot).
No service is installed/started on the dev host; that is a VM/admin
integration step. fmt + clippy -D warnings + release build + 55 tests
all pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The SPEC-016 Phase B credential-store guard referenced "SPEC-017" for the
forthcoming SYSTEM service host, but 017 is now Mike's end-user-access
spec; the service host is SPEC-018. Comment + error-string text only, no
logic change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LocalSystem service that runs the persistent agent unattended and brokers
per-session capture/input workers (Session 0 can't capture directly).
Unblocks SPEC-016 Phase B end-to-end (SYSTEM-ACL'd cak_ store readable;
removes the Phase B fail-fast guard) and is the broker primitive SPEC-013
builds on. 017 was taken by Mike's end-user-access spec, so this is 018.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
H1: derive machine_uid from the durable hardware salt ALONE (SMBIOS UUID, or
board+disk serial) plus a fixed namespace, so it survives an OS re-image (which
regenerates MachineGuid). MachineGuid is demoted to a last-resort signal used
only when no hardware salt is readable (volatile, reboot-only floor). Re-image
stability proven by salted_uid_is_reimage_stable_independent_of_machine_guid.
H2: in store_cak, lock the directory ACL BEFORE any secret bytes are written;
the temp file is created inside the already-locked dir, then renamed. No
ciphertext ever exists at an inherited/world-readable path. Ordering made an
explicit precondition, not an unstated inheritance assumption.
M1: load_cak now returns a LoadCakError enum distinguishing Io (incl.
PermissionDenied — operational) from Decrypt (the real tamper/wrong-machine
signal). Only a successful READ whose DPAPI decrypt fails hard-stops.
M2: the PowerShell SMBIOS/board/disk shell-out is spawned and waited on with a
10s wall-clock bound; on timeout the child is killed and the signal is treated
as missing (falls back through the chain), never panics. Keeps
CREATE_NO_WINDOW -NonInteractive -NoProfile.
L1: warn! breadcrumb when the salted derivation degrades to MachineGuid-only,
so the server-side collision-gate operator has a clue. No secret values logged.
C1: keep the SYSTEM+Administrators ACL (Option A target). store_cak now does a
read-back verification immediately after writing and fails at ENROLL time if
this context cannot read its own store; resolve_agent_credential fails fast with
an actionable SPEC-017 message on an access-denied store instead of silently
re-enrolling/bricking. Guarded comment notes this is satisfied once the SYSTEM
service host lands.
Deferred items (clear_cak placeholder, legacy api_key path) left as-is.
Verification on x86_64-pc-windows-msvc: cargo fmt --check clean, clippy
-D warnings clean, release build OK, 52 tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New enroll module: on a managed agent with no stored cak_ but with
enrollment_key + site_code, POST machine_uid + hostname + labels to
<https-base>/api/enroll and persist the minted cak_. Handles every Phase A
status code distinctly:
- 201 new / 200 reuse -> persist cak_ (DPAPI store) and connect
- 202 collision_pending -> log "pending operator confirmation", slow
re-check loop (no key issued; cannot connect until confirmed)
- 401 ENROLL_REJECTED / 409 ENROLL_SITE_CONFLICT -> distinct actionable
errors, long backoff (won't fix without operator action, but recovers
automatically once it does) — no tight loop
- 429 -> honor Retry-After, short backoff
- network / 5xx / decode -> short backoff
The enrollment_key and cak_ are never logged. Uses the existing reqwest
client and the update path's TLS posture (rustls; dev-insecure only in
debug + opt-in). Wire-contract unit tests pin the request shape against
the server's EnrollRequest/EnrollLabels and decode active + pending bodies.
main.rs run-mode wiring: before a managed agent connects, resolve the
operating credential by precedence — stored cak_ (steady state, no
network) -> first-run enrollment -> DEPRECATED legacy api_key (transition
only, logged at WARNING) -> error. The relay already accepts the cak_ as
the api_key query param, so the persistent transport authenticates with it
unchanged. Attended/support-code and viewer paths are untouched.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Store the per-machine cak_ with BOTH layers Mike locked: DPAPI-machine
encryption (CryptProtectData with CRYPTPROTECT_LOCAL_MACHINE — a copied
blob is inert off the box) inside a SYSTEM/Administrators-only ACL'd file
at %ProgramData%\GuruConnect\credentials\agent.cak. The directory + file
ACL is hardened via icacls (/inheritance:r + grant to the well-known SIDs
*S-1-5-18 and *S-1-5-32-544, locale-independent) — auditable, with far
less unsafe FFI than building a registry-key security descriptor by hand.
Co-locates with the existing %ProgramData%\GuruConnect config/seed dir.
Provides store_cak / load_cak / clear_cak. store_cak writes atomically
(temp file + rename in the locked dir). load_cak treats a present-but-
undecryptable blob as a hard error (tamper / cross-machine copy) rather
than silently re-enrolling over it. The plaintext is never logged; the
transient plaintext copy is scrubbed after encryption. DPAPI output blobs
are LocalFree'd. Enables the Win32_Security_Cryptography windows feature.
Round-trip unit tests cover encrypt/decrypt recovery across lengths and
that a tampered blob fails to decrypt (DPAPI authenticates its blobs).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add enrollment_key + site_code to EmbeddedConfig and the resolved Config
alongside the existing labels, and add department/device_type label fields
(SPEC-007 AgentStatus parity). The legacy api_key is retained but made
optional/defaulted so a SPEC-016 site installer can carry only the
enrollment credentials; existing pre-enrollment installers still parse.
The enrollment fields are #[serde(skip)] on Config so they are never
written to the on-disk TOML (install-time material only); apply_enrollment_env
layers them from GURUCONNECT_ENROLLMENT_KEY / GURUCONNECT_SITE_CODE on the
file and env load paths. The embedded path carries them from the install
blob. Config delivery itself (signed wrapper) is Phase C and unchanged here.
Add Config::https_base() deriving the REST API base (https://host[:port])
from the wss:// server_url so the enroll client and the persistent
transport share one authority.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extend the SPEC-004 machine_uid derivation with the locked SPEC-016
hardware salt: combine the Windows MachineGuid with the SMBIOS system
UUID (Win32_ComputerSystemProduct.UUID), falling back to motherboard
serial (Win32_BaseBoard.SerialNumber) + primary disk serial when the
SMBIOS UUID is absent or a degenerate placeholder (all-zeros / all-FFs,
emitted by some OEMs and hypervisor templates).
Signals are read via narrow PowerShell CIM queries (hidden window, no
profile) rather than adding a WMI crate or hand-rolling COM IWbemServices
for two scalar reads. Values are normalized (trim + upper-case) so vendor
case/space drift never perturbs the digest. The combined string is
SHA-256'd into the existing opaque muid_<hex> shape, preserving the wire
identity the relay connect path already reports while making it survive an
OS re-image on the same hardware. Which signal set fed the result is
logged (source label only, never the secret values).
Adds unit tests for derivation determinism + signal-sensitivity,
degenerate-SMBIOS rejection, and signal normalization.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Phase A work passed cargo check + clippy + tests locally but missed
`cargo fmt --all -- --check` (the first step of the Linux CI job): module
ordering in db/mod.rs and two trailing-comment alignments in rate_limit.rs.
No logic change. Agent build failure on the prior run was transient infra
(verified: agent crate compiles clean locally).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Applies the four review fixes to POST /api/enroll, all in server/src/api/enroll.rs
(+ a new ENROLL_SITE_CONFLICT event type in server/src/db/events.rs):
1. HIGH — close the within-tenant cross-site silent-move hijack. A valid key for
site B presented for a machine_uid already bound to a DIFFERENT site is now
REFUSED (409 ENROLL_SITE_CONFLICT) instead of silently repointing the row and
minting a fresh cak_. No move, no key. Emits an ENROLL_SITE_CONFLICT audit event
+ alert TODO. Same-site match still resolves to reuse; a NULL prior site_id is a
first relational bind, not a move. The unauthenticated site_move mint path is
removed; deliberate moves are deferred to the Phase-B --reassign flow + dashboard.
2. MEDIUM — kill the timing/enumeration oracle. Unknown site_code and no-active-key
early rejects now pay a dummy Argon2id verify against a fixed, valid throwaway PHC
constant (TIMING_EQUALIZER_PHC) before returning the identical 401, so every
rejection path pays one KDF. The constant is asserted valid + verifying in tests.
3. LOW — fix the new-enroll TOCTOU. The dedup lookup + INSERT is wrapped in a bounded
retry loop: a concurrent first-enroll of the same machine_uid whose INSERT loses
the unique-index race (classified by is_machine_uid_conflict on SQLSTATE 23505 +
machine_uid constraint) now re-looks-up and converges to reuse instead of 500ing.
A non-machine_uid unique violation still surfaces as 500.
4. LOW — make the collision-gate doc honest + leave an enforcement TODO. The module
doc now states the gate withholds only a NEWLY minted cak_ (a prior clean cak_
survives) and that nothing consults enrollment_state at control time yet, with a
TODO(SPEC-016 Phase B/D) marker for relay/control-plane enforcement + revocation.
Verify: cargo check, cargo clippy --all-targets, and cargo test all clean on this
Windows host (104 tests pass). Two DB-gated tests (cross-site bound-site_id exposure,
machine_uid-vs-agent_id conflict classification) no-op without TEST_DATABASE_URL and
run against real Postgres in CI; the Linux target / real-Postgres handler path is
validated there, not on this host.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Server-side zero-touch per-site enrollment (Phase A: backend + DB only;
agent-side machine_uid derivation is Phase B, server treats it as opaque).
Migration 010_spec016_enrollment.sql:
- connect_sites: relational site anchor (site_code natural key, per-tenant
unique). The spec assumed a sites table existed; it did not (site/company
were free-text columns on connect_machines), so this creates a minimal one.
- site_enrollment_keys: rotatable, Argon2id-hashed cek_ secret + monotonic
version + hex fingerprint + active flag; one-active-per-site partial unique.
- connect_machines: + site_id (FK), + enrollment_state ('active'|'pending')
collision gate, + per-tenant (tenant_id, machine_uid) unique index added
ALONGSIDE the 008 global index (the connect-path upsert_machine ON CONFLICT
arbiter binds to 008 — dropping it would break live reconnect).
- connect_sites.enrollment_policy: reserved (default auto-approve), not enforced.
auth/enrollment_keys.rs: cek_ mint (256-bit, OS CSPRNG), Argon2id hash/verify
(reuses auth::password), and hex fingerprint vN (XXXX) per resolved-decision #3.
db/sites.rs + db/enrollment_keys.rs: runtime sqlx persistence; rotate_key
deactivates+inserts in one tx to hold the one-active-key invariant.
POST /api/enroll (public, api/enroll.rs): site_code+cek_ verify against active
key -> dedup on (tenant, machine_uid) -> new / reuse / site-move / collision.
Collision gate (PROVISIONAL heuristic: online existing row + different hostname)
-> pending, no usable cak_, alert. Mints cak_ via existing agent_keys path in the
exact form relay::validate_agent_api_key expects. Per-(site_code,IP) rate-limit +
lockout (EnrollLimiter). Audit events + [ENROLL] alert markers with
TODO(SPEC-016) #dev-alerts notes.
Admin (JWT) api/sites.rs: POST /api/sites/:id/enrollment-key/rotate (plaintext +
fingerprint once) and GET .../enrollment-key (fingerprint/version, no secret).
Routes wired in main.rs (enroll public, rotation admin). 13 new unit tests;
full server suite 99 passing. cargo check + clippy clean on the host (Windows)
target — Linux cross-target not installed here; server crate is platform-neutral
Rust. No sqlx offline cache needed (codebase uses runtime queries, no query!).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fold the 2026-06-02 interview decisions into SPEC-016:
- Installer wrapper: ship BOTH signed .exe and signed MSI per site
- cak_ at-rest storage: DPAPI-machine-encrypted blob in a SYSTEM-ACL'd location
- Fingerprint: hex (7F2A), deliberately unlike RMM word-codes
- machine_uid: per-tenant scope + hardware-derived salt (survives re-image,
separates distinct boxes) + collision-gated activation (template-cloned VMs
sharing a hardware UUID drop to pending + alert, need dashboard confirm)
- Attended support-code path: unchanged (filename-based, already signing-safe)
Open Questions section -> Resolved decisions + a short Remaining-for-planning
list (exact hardware salt signal set, WiX/MSI authoring approach).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ScreenConnect-class managed enrollment: one signed installer per site,
machines self-register on first run and the server mints a per-machine
cak_ key bound to a deterministic machine_uid (dedups re-installs).
Per-site rotatable enrollment key (long secret + vN (XXXX) fingerprint);
rotating blocks new enrollments from old installers, leaves enrolled
agents untouched. Auto-approve + new-enrollment/site-move alert.
Resolves SPEC-007's signature-vs-appended-config open question:
sign the base agent once in CI + per-site signed wrapper that writes
site config around the signed bytes (never appended into the PE).
Deferred (room reserved): enrollment policy + per-seat licensing,
--enroll-key/--site-code/--reassign flag overrides, technician-assisted
interactive install. Tracking todo dbfe6a56.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Release-path Azure Trusted Signing and auto-versioning were already
shipped with v0.3.0 (stale [ ] -> [x]). Add a new P1/NOW item for a
signed beta/test release channel: the auto build-and-test.yml agent
artifact is unsigned, so testers can receive unsigned binaries. The
beta channel (now implemented in release.yml) closes that gap.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a `channel: stable | beta` workflow_dispatch input to release.yml.
`stable` is unchanged (byte-for-byte). `beta` produces a Windows agent
binary signed by the identical fail-closed Azure Trusted Signing path,
but skips the semver bump, changelog, and release commit, and publishes
a prerelease-tagged Gitea release (vX.Y.Z-beta.<run_number>) at HEAD.
So every binary handed to a tester is signed, not just formal releases.
- prerelease tags excluded from stable LAST_TAG detection (both lookups)
so a beta tag can't corrupt the next stable version computation
- beta tag force-created/pushed -> idempotent on failed-run re-runs
- changelog download gated to stable; release prerelease flag plumbed
through to the Gitea REST payload
Reviewed-by: Code Review Agent (APPROVE WITH NITS; N1 hardened)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bare ON CONFLICT (machine_uid) could not bind to migration 008's partial unique index, so no connect_machines row was persisted for any agent reporting a machine_uid. Confirmed live on 172.16.3.30 with a signed 0.3.0 test agent.
/gc-audit --pass=security re-pass over the deployed v0.3.0 code: PASS,
0 CRITICAL/HIGH/MEDIUM/LOW. The 3 relay CRITICALs stay closed (verified in
code AND live against the deployed binary), the prior agent-update-TLS HIGH
and chat-logging LOW are fixed, and the net-new SPEC-004 surface (machine_uid
dedup gate, session reaper/supersede, operator removal API) audits clean —
no non-admin removal path, no uid-spoof hijack, no auth-plane crossover.
Marks v2 Phase 1 formally exited (secure-session-core Task 8 complete).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
agent + server Cargo.toml hardcoded 0.2.0 (below the workspace.package
0.2.2 and the last release tag v0.2.2); dashboard was on a divergent
2.0.0 scheme. Align all component manifests + the dashboard lockfile to
the v0.2.2 baseline so the next release bumps them coherently to 0.3.0
rather than decreasing the dashboard. No code change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Admin-only per-row Remove + multi-select bulk removal on the machines view, plus
per-row purge Remove on the sessions view, wired to the Task-5 admin API
(DELETE /api/machines|sessions/:id?purge=true, POST /api/machines/bulk-remove).
Confirm modals (danger-styled, focus-trapped), TanStack refetch so purged rows
leave the console, structured ApiError surfacing, honest partial-bulk summary,
and admin-gating via useAuth().isAdmin as defense-in-depth over the server 403.
Replaces the legacy all-user delete trigger. typecheck/lint/build clean.
Implements specs/v2-stable-identity/plan.md Task 5 (dashboard portion).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Admin-gated soft-delete + purge so operators can clear ghost machines/sessions
(the ~15-rows-for-one-host accumulation) from the console.
- migration 009: deleted_at on connect_sessions + connect_machines, with partial
indexes WHERE deleted_at IS NULL.
- DELETE /api/machines/:agent_id?purge=true and DELETE /api/sessions/:id?purge=true
soft-delete the row and purge the in-memory session (remove_session); the
non-purge path keeps the legacy hard-delete / live-only disconnect. POST
/api/machines/bulk-remove handles multi-select (batch cap 500). All admin-gated
(AdminUser -> 403; tightens the prior any-user delete) and audited to
connect_session_events (actor + target + trusted client IP).
- list/get queries filter deleted_at IS NULL so removed units leave the console;
upsert revives (deleted_at = NULL) a genuinely-reconnecting machine. The
keyed-reattach identity resolver (get_machine_by_id) is intentionally unfiltered.
Dashboard removal UI is the A3b follow-up. 86 server tests pass; fmt/clippy/test
clean. Implements specs/v2-stable-identity/plan.md Task 5 (server portion).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pure rustfmt reflow of the Task 2 (machine_uid dedup) and Task 4 (session
reaping) code; no logic change. The CI Build-Server-Linux job gates on
cargo fmt --check, which the two feature commits failed because local
validation ran check/clippy/test but not fmt --check. fmt --check, check,
and clippy -D warnings all clean now.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A periodic reaper removes persistent, offline, viewerless sessions whose last
heartbeat is older than a 10-minute TTL (60s sweep spawned at startup), and a
same-machine supersede on the new-session path drops a stranded prior session
when a legacy no-uid agent upgrades to a fresh agent_id + machine_uid. Both
removals re-assert the predicate under the write lock (remove_session_if) to
close a snapshot->remove TOCTOU.
Security: keyed (cak_) agents pass machine_uid=None, so they never trigger
supersede and are never reaped as a uid victim; online, viewer-attached, and
support sessions are never reaped. 82 server tests pass; clippy clean.
Implements specs/v2-stable-identity/plan.md Task 4.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Persist the agent-reported machine_uid and dedup connect_machines on it so a
single physical machine can't register duplicate rows when its config-file
agent_id regenerates (the ghost-session root cause).
- migration 008: nullable connect_machines.machine_uid + partial unique index
(WHERE machine_uid IS NOT NULL); idempotent, startup-applied.
- upsert_machine: two-path dedup (ON CONFLICT machine_uid when present, else
the legacy ON CONFLICT agent_id path, unchanged).
- session reattach: a machine_uid index consulted before agent_id, with all
removal paths purging it.
- security: keyed (cak_) agents stay authoritative — their claimed machine_uid
is dropped (effective_machine_uid=None); uid is dedup-only for un-keyed /
support-code agents. Startup restore skips uid-indexing keyed machines and
fails closed if the keyed-set query errors.
74 server tests pass; clippy clean. Implements specs/v2-stable-identity/plan.md Task 2.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The native viewer's H.264 path (Task 7 first-cut, compile-verified only)
never rendered a frame. Three stacked bugs, all confirmed via live loopback:
1. decoder: MF_E_NOTACCEPTING (0xC00D36B5) was treated as fatal and only
one output was drained per call, so once the MFT filled it rejected
every subsequent frame. decode() now returns Vec<DecodedFrame>, drains
on back-pressure and retries the unconsumed sample, then drains all
ready outputs.
2. decoder: the NV12 output type was hand-built and rejected by the MS
H.264 decoder MFT (MF_E_TRANSFORM_TYPE_NOT_SET, 0xC00D6D60). It is now
negotiated by enumerating GetOutputAvailableType on STREAM_CHANGE /
TYPE_NOT_SET.
3. render: a manual pump_messages() in about_to_wait stole winit's own
thread messages and froze the event loop after one iteration, so frames
were never drained from the channel. Removed; winit's run_app pump
already services the WH_KEYBOARD_LL hook.
Validated on a 5070 loopback: 0 decode errors, frames decode/paint/present
(present count 0 -> 1740). Reviewed (APPROVE-WITH-NITS); diagnostics stripped.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comprehensive specification for branding/whitelabel configuration.
- Dashboard admin settings page (logo, brand hue, product name, company name, favicon)
- OKLCH color system with CSS variables for dynamic theming
- Agent tray tooltip customization via registry key
- Singleton database table with public GET endpoint
- Priority: P2, Effort: Medium (4-6 weeks)
- Added to roadmap under Server/API (v2 Phase 2)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Agent now derives a recomputable, opaque machine_uid (Windows: SHA-256 of the OS
MachineGuid at HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid -> muid_<hex>;
non-Windows / registry-failure: persisted random UUID, warn-logged). Raw GUID
never exposed; OnceLock-cached. Reported ALONGSIDE agent_id (unchanged) on
AgentStatus (new additive proto field 12) and in the connect handshake query.
This is the stable identity that fixes config-loss duplicate registrations
(DESKTOP-I66IM5Q x9); server-side dedup keying that consumes it is SPEC-004
Task 2. Non-breaking, isolated. 5 unit tests; cargo fmt/clippy(-D warnings)/test
green on GURU-5070.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ordered, execution-ready plan for SPEC-004 (stable machine identity + session
reaping + operator removal). Works out the core integration: machine_uid =
deterministic MachineGuid-based hardware identity (recomputable, so config loss
can't duplicate); per-agent cak_ key stays the credential/trust boundary; they
compose so one cak_ key per machine_uid = one key per real machine (the
prerequisite the fleet key-migration #7 needs). Root cause grounded in code:
agent_id is a random UUID (config.rs:90), connect_machines dedups on ON CONFLICT
(agent_id), so config loss -> duplicate rows (DESKTOP-I66IM5Q x9 live). 5 ordered
tasks (agent uid -> server dedup -> reconcile/age-out -> reaping -> operator
removal). Unblocks #7 -> #5.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lets the HW-H.264 path be live-validated on tagged test agents without affecting
the live client fleet. Adds H264_TEST_TAG="h264-test" + a pure prefer_h264_for(tags)
helper (DEFAULT_PREFER_H264 || tags contains the tag, case-insensitive); StartStream
codec negotiation now computes prefer_h264 from the agent's reported tags instead of
the bare const, and logs the computed value. SAFETY: untagged sessions are byte-for-
byte unchanged (prefer_h264 == DEFAULT_PREFER_H264 == false -> raw); the supports_h264
guard still forces raw for a no-HW agent even when tagged. DEFAULT_PREFER_H264 stays
false (flipping the global default is a separate future step). 3 unit tests added.
cargo fmt/clippy(-D warnings)/test green on GURU-5070 (37 agent + 64 server).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Security follow-ups (audit 2026-05-30, both reviewed APPROVE):
- MEDIUM: viewer tokens were never blacklisted on logout, so a minted
session-scoped viewer token stayed valid up to its 5-min TTL after the user
logged out. Add a per-user ViewerTokenRegistry (Arc<Mutex<HashMap<sub,
Vec<(token, expires_at)>>>>, prune-on-insert) on AppState; mint_viewer_token
registers each token under the user sub; logout drains take_for_user(sub) and
blacklists each via the existing token_blacklist. The viewer WS already calls
is_revoked, so no WS change. Key chain user.user_id == ViewerClaims.sub ==
registry key verified consistent. 8 new tests.
- LOW: relay chat logs now emit content length, not the chat body (support-chat
can carry secrets/PII).
cargo fmt/clippy(-D warnings)/test green on GURU-5070 (37 agent + 61 server).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Major update to SPEC-012 adding dual-mode terminal access:
Mode 1: Serial Console Mode (True Remote Console)
- Direct access to system serial console (/dev/ttyS0 or /dev/console)
- Sees GRUB bootloader, kernel boot messages, login prompts, kernel panics
- Boot-time interaction: select GRUB entries, edit kernel parameters, single-user mode
- Requires root privileges or CAP_SYS_TTY_CONFIG capability
- Setup: GRUB + kernel parameters configured for serial console output
- Like KVM-over-IP or IPMI Serial-over-LAN (text-mode equivalent)
Mode 2: PTY Shell Mode (Interactive Shell)
- Spawn pseudo-TTY with bash/zsh shell session
- Normal server management (package updates, log review, etc.)
- Runs as unprivileged agent service user
- Standard interactive shell with full ANSI/VT100 support
Architecture:
- Agent mode selection based on viewer request (console vs. shell)
- Dashboard shows two buttons: "Console" and "Shell" for headless agents
- Same xterm.js viewer handles both modes transparently
- Protobuf extensions: TerminalModeRequest enum, console_mode flag
Security:
- Console mode requires root (boot-level control risk)
- Recommend RBAC: separate console_access and shell_access permissions
- Console sessions should require MFA (Phase 2)
- Audit logging for both modes
Setup Requirements:
- One-time GRUB configuration for serial console
- systemd service with CAP_SYS_TTY_CONFIG for console mode
- serial-getty@ttyS0.service enabled for login prompt
Updated effort: Medium (5-7 weeks, up from 4-6)
Priority remains P2
Addresses user request for "remote console" (as if at the machine)
not just shell access.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The auto-update path built both reqwest clients with an unconditional
danger_accept_invalid_certs(true), so a network MITM could serve an arbitrary
update .exe (checksum is no defense — same unverified channel) and gain RCE on
every managed endpoint. Replace with dev_insecure_tls() = cfg!(debug_assertions)
&& env GURUCONNECT_DEV_INSECURE_TLS: the cfg gate compiles out of release builds,
so a shipped agent ALWAYS verifies certs; dev keeps a self-signed escape hatch.
Loud warn when the insecure path is taken; verify_checksum kept + documented as
transport-integrity (not tamper) defense; TODO + follow-up for embedded-key
update signing (defense-in-depth). Release-invariant unit test added.
cargo fmt/clippy(-D warnings)/test green on GURU-5070 (90 tests). Closes the
2026-05-30 security-audit HIGH (reports/2026-05-30-gc-audit.md).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
From the secure-session-core Tasks 3-5 code review (APPROVE-WITH-FIXES):
- MEDIUM-2: delete the dead `validate_agent_key` "accept-any-key" placeholder +
its AuthenticatedAgent/AuthState scaffolding (zero callers; the real agent
auth is validate_agent_api_key + per-agent cak_ keys). Removes an auth landmine.
- LOW-3: stop interpolating support-code values into 3 relay log lines (bearer
credentials).
- LOW-1: document the X-Real-IP trust requirement in ip_extract.rs (NPM must set
it from $remote_addr); behavior unchanged.
- LOW-2: correct the consent/heartbeat comment in agent session loop (the loop
awaits the dialog; safe because CONSENT_TIMEOUT 60s < HEARTBEAT_TIMEOUT 90s).
cargo fmt/clippy(-D warnings)/test all green on GURU-5070 (89 tests, 0 warnings).
MEDIUM-1 (viewer-token logout revocation) remains a tracked follow-up.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Re-baseline against actual git/deploy state: secure-session-core Tasks 1-7 are
committed and DEPLOYED; the 3 audit CRITICALs are closed and live in prod
(verified: deployed checkout abc55ab descends from the CRITICAL#1 fix + Task 7;
guruconnect.service running on :3002). The prior "Sprint 0: bypasses are live"
banner was wrong (stale 2026-05-29 audit narrative) and is removed. Remaining
to exit Phase 1 = secure-session-core Task 8 (e2e verification + security
re-audit) + Code-Review sign-off on Tasks 3-5. Schema note corrected
(connect_agent_keys + tenancy already exist via migration 004).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mark SPEC-003..009 as work-items inside the SPEC-002 v2 phases (not standalone
v1 backlog): banner records the v2-reset decision + the Sprint-0 relay-auth
CRITICAL hotfix, a phase-mapping table (004->P1, 008->P0/1, 003/005/006/007->P2,
009->P3), inline [-> v2 Phase N] tags per spec, and a note to bake SPEC-003
inventory cols + SPEC-004 machine_uid + connect_agent_keys into the Phase-0
fresh schema. Sprint planning 2026-05-30 (Mike: v2 reset first).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Everything the console does should be callable by API, documented and
discoverable. Adds: OpenAPI 3.x generated from code (utoipa) + Swagger/Redoc at
/api/docs (drift-proof, route<->spec parity test); long-lived revocable scoped
API tokens (connect_api_tokens, hashed like agent keys) distinct from the 24h
dashboard JWT and agent keys; an API-completeness gap audit (folds in SPEC-004/
006/007 endpoints); consistent pagination/filtering + versioning policy. Today
there is zero API doc tooling and no programmatic token. Depends on SPEC-008 for
the documented error envelope; distinct from the ADR-001 integration contract.
Large. Parallel guru-rmm SPEC-019. Requested by Mike 2026-05-30.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cross-cutting error-quality initiative: one structured AppError envelope
(stable error_code + message + correlation_id) replacing the current ad-hoc
mix (bare (StatusCode,&str) tuples, per-file ErrorResponse, two JSON envelopes
the dashboard already unions); correlation-id middleware tied to tracing spans
+ response header so a reported id greps the log; contextual error logging with
identifiers + error chain; sweep the 37 server `let _ =` swallows (the pattern
that silently hid migration-005's missing columns); dashboard renders the real
cause + correlation id (drop the hardcoded generic at MachinesPage.tsx:202);
agent logs why/where auth/connection failed (the auth-loop incident gave no
local signal). Phaseable; Large. Parallel RMM request keeps conventions aligned.
Requested by Mike 2026-05-30.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Dashboard "Build Installer" wizard for pre-labeled managed/persistent agents
(Name/Company/Site/Department/Device Type/Tag/Type) with Download / Copy URL /
Send Link, ScreenConnect-style. The embed-config build path already exists
(downloads.rs appends EmbeddedConfig GURUCONFIG blob; AgentDownloadParams takes
company/site/tags/api_key; agent reads it at config.rs:223) - missing is the UI,
department + device_type fields (EmbeddedConfig/AgentStatus/connect_machines),
name strategy, and Copy-URL/Send-Link actions. Labels persist at install time,
feeding SPEC-003/005/006. Embedded key should be revocable per-machine/site
(pairs with SPEC-004). Biggest open question: appending config after Authenticode
signing invalidates the signature. Requested by Mike 2026-05-30.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>