fix(server): tolerate NULL connect_machines columns (tags decode bug)
Some checks failed
Build and Test / Build Server (Linux) (push) Failing after 3m37s
Build and Test / Build Agent (Windows) (push) Successful in 7m30s
Build and Test / Security Audit (push) Successful in 4m49s
Build and Test / Build Summary (push) Has been skipped

connect_machines.tags is text[] nullable with no default; the derived
FromRow decoded it as non-Option Vec<String>, so rows with NULL tags
threw "unexpected null" - breaking managed-session reconcile on startup
and the authed Machines list. Hit in production on the v2 cutover.

- Replace the derived FromRow on Machine with a manual impl that decodes
  every nullable-non-Option column as Option<T> with unwrap_or_default
  (tags, is_elevated, is_persistent, status, timestamps), fixing all six
  read sites at once. Public field types unchanged.
- migrations/007: backfill NULL tags to empty array, set DEFAULT '{}',
  set NOT NULL (no writer inserts NULL: upsert omits tags, metadata
  update binds a non-null array). Idempotent with the prod hot-patch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-30 15:17:12 -07:00
parent 96b4fd7721
commit abc55abb0b
2 changed files with 103 additions and 3 deletions

View File

@@ -0,0 +1,39 @@
-- Migration: 007_fix_machine_tags_null.sql
-- Purpose: Make connect_machines.tags self-protecting against NULL.
--
-- Migration 005 intended to create `tags TEXT[] NOT NULL DEFAULT '{}'`, but it
-- used `ADD COLUMN IF NOT EXISTS`. On the production instance the column already
-- existed (created without NOT NULL / without a default by an earlier path), so
-- 005's constraints were never applied — the column stayed nullable with no
-- default. Rows with `tags IS NULL` then failed to decode in the Rust `Machine`
-- FromRow (`unexpected null; try decoding as an Option`), which broke the startup
-- "reconcile managed sessions" task and would 500 the authenticated Machines list.
--
-- Production was hot-patched with `UPDATE connect_machines SET tags='{}' WHERE
-- tags IS NULL`. This migration makes the schema itself enforce the invariant so
-- it cannot recur: backfill any remaining NULLs, set a column DEFAULT, then add a
-- NOT NULL constraint. The Rust decode was also made NULL-tolerant (manual FromRow)
-- as belt-and-suspenders.
--
-- Idempotent: the backfill is a no-op once rows already hold '{}' (the prod hot-patch
-- already ran); SET DEFAULT is idempotent; SET NOT NULL is a no-op if the constraint
-- already exists. Applied on server startup by sqlx::migrate!(); never pre-applied
-- via psql. Ordered after 006.
-- See .claude/standards/gururmm/sqlx-migrations.md.
-- 1. Backfill any rows still holding NULL (must precede SET NOT NULL).
UPDATE connect_machines
SET tags = '{}'
WHERE tags IS NULL;
-- 2. Pin a column default so future INSERTs that omit `tags` get an empty array
-- rather than NULL.
ALTER TABLE connect_machines
ALTER COLUMN tags SET DEFAULT '{}';
-- 3. Enforce the invariant at the schema level. Safe: the only writer of `tags`
-- (db::machines::update_machine_metadata) binds a non-null TEXT[]; no INSERT path
-- supplies an explicit NULL for tags (upsert_machine omits the column, so it now
-- takes the DEFAULT above). After the backfill no NULL rows remain.
ALTER TABLE connect_machines
ALTER COLUMN tags SET NOT NULL;