SPEC-016 Phase A: zero-touch enrollment backend + migration #5
Reference in New Issue
Block a user
Delete Branch "feat/spec-016-enrollment"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Phase A of SPEC-016 (zero-touch per-site agent enrollment) — server backend + DB migration only. Agent-side machine_uid derivation, installers, and dashboard are later phases.
What's here
Review status
Code-reviewed (APPROVE WITH NITS) + a focused re-review confirming all fixes closed:
CI must validate (couldn't verify on the Windows dev host)
Inert until later phases
No installer issues enrollment keys yet and no agent calls /api/enroll, so the endpoint is unreachable in practice until Phase B/C. The deprecated shared AGENT_API_KEY relay fallback is untouched.
Spec: docs/specs/SPEC-016-zero-touch-enrollment.md. Tracking todo dbfe6a56.
Server-side zero-touch per-site enrollment (Phase A: backend + DB only; agent-side machine_uid derivation is Phase B, server treats it as opaque). Migration 010_spec016_enrollment.sql: - connect_sites: relational site anchor (site_code natural key, per-tenant unique). The spec assumed a sites table existed; it did not (site/company were free-text columns on connect_machines), so this creates a minimal one. - site_enrollment_keys: rotatable, Argon2id-hashed cek_ secret + monotonic version + hex fingerprint + active flag; one-active-per-site partial unique. - connect_machines: + site_id (FK), + enrollment_state ('active'|'pending') collision gate, + per-tenant (tenant_id, machine_uid) unique index added ALONGSIDE the 008 global index (the connect-path upsert_machine ON CONFLICT arbiter binds to 008 — dropping it would break live reconnect). - connect_sites.enrollment_policy: reserved (default auto-approve), not enforced. auth/enrollment_keys.rs: cek_ mint (256-bit, OS CSPRNG), Argon2id hash/verify (reuses auth::password), and hex fingerprint vN (XXXX) per resolved-decision #3. db/sites.rs + db/enrollment_keys.rs: runtime sqlx persistence; rotate_key deactivates+inserts in one tx to hold the one-active-key invariant. POST /api/enroll (public, api/enroll.rs): site_code+cek_ verify against active key -> dedup on (tenant, machine_uid) -> new / reuse / site-move / collision. Collision gate (PROVISIONAL heuristic: online existing row + different hostname) -> pending, no usable cak_, alert. Mints cak_ via existing agent_keys path in the exact form relay::validate_agent_api_key expects. Per-(site_code,IP) rate-limit + lockout (EnrollLimiter). Audit events + [ENROLL] alert markers with TODO(SPEC-016) #dev-alerts notes. Admin (JWT) api/sites.rs: POST /api/sites/:id/enrollment-key/rotate (plaintext + fingerprint once) and GET .../enrollment-key (fingerprint/version, no secret). Routes wired in main.rs (enroll public, rotation admin). 13 new unit tests; full server suite 99 passing. cargo check + clippy clean on the host (Windows) target — Linux cross-target not installed here; server crate is platform-neutral Rust. No sqlx offline cache needed (codebase uses runtime queries, no query!). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>