sync: auto-sync from HOWARD-HOME at 2026-06-23 05:36:06

Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-23 05:36:06
2026-06-23 05:36:38 -07:00
parent 56bf6f44ff
commit 7fb32ba349
3 changed files with 193 additions and 0 deletions
--- a/clients/cascades-tucson/session-logs/2026-06/2026-06-23-howard-cascades-planned-outage-shutdown-verify.md
+++ b/clients/cascades-tucson/session-logs/2026-06/2026-06-23-howard-cascades-planned-outage-shutdown-verify.md
@@ -0,0 +1,105 @@
+## User
+- **User:** Howard Enos (howard)
+- **Machine:** Howard-Home
+- **Role:** tech
+
+## Session Summary
+
+Resumed the Cascades of Tucson planned-power-outage work at fire time on 2026-06-23.
+The building had a scheduled electrical cut from 05:30-09:00 MST. The shutdown sequence
+had been authored, automated, and armed the prior evening (2026-06-22 ~19:06) per the
+runbook `clients/cascades-tucson/docs/runbooks/2026-06-23-planned-power-outage.md`; this
+session was the fire-time verification that the arms executed cleanly.
+
+Pulled the runbook back up to re-establish context. The plan: graceful self-shutdowns
+on all three core devices (CS-SERVER, Synology, pfSense) so nothing takes a dirty
+power-loss — critical because CS-SERVER runs a DEGRADED RAID-1 OS mirror on a single old
+spindle, the worst-case box for an unclean cut. Each device was armed with a self-contained
+local schedule so it fires independently of the Howard-Home session and the OpenVPN tunnel.
+
+Attempted the "re-verify the arms" step but discovered via timezone math (UTC 12:30 =
+05:30 MST, Phoenix UTC-7 no DST) that we were already AT fire time — past the
+"verify before 05:28" window. The shutdowns had already executed. Confirmed clean
+execution through the one independent channel that survives the site going dark: GuruRMM
+cloud. CS-SERVER reported offline with last_seen 2026-06-23T12:29:49Z (05:29:49 MST) —
+~1.5 min after its 05:28 scheduled task, exactly the expected graceful-shutdown lag
+(stop CS-QB VM -> wait 25s -> Stop-Computer -Force). pfSense SSH was unreachable (exit 255),
+consistent with it being the gateway/VPN endpoint and powering off. Synology had no remote
+path (it sits behind the pfSense tunnel, now down) but was armed and verified the prior night.
+
+Net result: the worst-case data-integrity risk (CS-SERVER) is positively confirmed handled.
+We are now in the dark window with no remote visibility into the site by design — pfSense is
+the OpenVPN endpoint, so once it powers off all in-site paths are gone; only RMM cloud reaches
+CS-SERVER. Nothing further to send until the ~09:00 bring-up.
+
+## Key Decisions
+
+- Did NOT attempt to re-arm or re-send shutdown commands once timezone math showed we were past
+  fire time — the arms had already fired and re-sending would be pointless/harmful.
+- Treated the pfSense SSH exit-255 and the Synology no-path as EXPECTED (not errors): with pfSense
+  down, the VPN endpoint is gone, so loss of in-site reachability is the designed outcome, not a fault.
+- Relied on GuruRMM cloud as the single authoritative confirmation channel for CS-SERVER (the only box
+  with an out-of-band path), since it is the highest-risk device (degraded RAID-1 mirror).
+- Read-only checks only -> no #dev-alerts bot post (per rmm skill rule: alerts on writes only).
+
+## Problems Encountered
+
+- `pfsense-ssh.sh cascades-tucson run ...` returned exit 255 (SSH connect fail). Initially ambiguous;
+  resolved by checking current time — it was 05:30:57 MST, i.e. pfSense had hit its 05:30 shutdown and/or
+  the building cut had landed. Expected, not a fault.
+- Git-Bash `TZ='America/Phoenix' date` printed GMT (no tzdata on this MSYS build). Worked around by
+  computing MST manually from UTC (Phoenix = UTC-7, no DST): 12:30 UTC = 05:30 MST.
+
+## Configuration Changes
+
+- None. Read-only verification session. No files modified on endpoints or in the repo (this log only).
+
+## Credentials & Secrets
+
+- None discovered, created, or rotated. RMM auth via existing vault path
+  `infrastructure/gururmm-server.sops.yaml` (admin-email / admin-password). pfSense via
+  `clients/cascades-tucson/pfsense-firewall`. No new secrets.
+
+## Infrastructure & Servers
+
+- CS-SERVER: 192.168.2.254 (Windows), GuruRMM agent c39f1de7-d5b6-45ae-b132-e06977ab1713, agent ver 0.6.66.
+  iDRAC 192.168.2.65. Confirmed offline last_seen 2026-06-23T12:29:49Z.
+- Synology (CascadesDS): 192.168.0.120, DSM http://192.168.0.120:5000, vault
+  clients/cascades-tucson/synology-cascadesds.
+- pfSense: 192.168.0.1 (gateway + OpenVPN endpoint), pfSense Plus 25.07.
+- UOS controller: https://172.16.3.29:11443, site va6iba3v / 685f39068e65331c46ef6dd2.
+- GuruRMM API: http://172.16.3.30:3001.
+- Outage window: 05:30-09:00 MST, 2026-06-23. Onsite: John Trozzi (physical power-on ~09:00).
+
+## Commands & Outputs
+
+- Time check: `date -u` -> 2026-06-23 12:30:57 UTC = 05:30:57 MST (Phoenix UTC-7).
+- pfSense arm re-verify: `pfsense-ssh.sh cascades-tucson run 'ps -axww | grep ... shutdown -p now'`
+  -> exit 255 (unreachable; expected — box down).
+- CS-SERVER status (RMM): `curl $RMM/api/agents | jq 'select(.hostname=="CS-SERVER")'`
+  -> `status=offline connected=null last_seen=2026-06-23T12:29:49.196769Z ver=0.6.66`.
+
+## Pending / Incomplete Tasks
+
+- BRING-UP at ~09:00 MST (John presses buttons, Howard monitors), bottom-up:
+  1. pfSense first — verify SINGLE dhcpd (`pgrep -f "dhcpd -user" | wc -l` -> 1), WAN up
+     (dpinger WAN_DHCP + WANCOAX_DHCP healthy). If WAN does NOT establish -> REBOOT THE COX MODEM
+     (the missed post-restore step from the 6/17 unplanned outage).
+  2. Switches/APs re-adopt — watch UOS controller until 12/12 switches + 77/77 APs connected.
+  3. CS-SERVER boots -> verify AD/DNS, DHCP role, Hyper-V VMs (CS-QB), file + print shares. Then Synology.
+  4. Straggler sweep: power-cycle printers/POS/IoT that cached a disconnected state. KNOWN: kitchen
+     thermal printer (iPad POS ticket printer) — invisible to DHCP scan, fix by power-cycle.
+- WATCH-LIST (6/17 casualties): Switch 2nd Floor #2 (USL24PB, 192.168.2.193) — one-way L2 break last time,
+  floors 2/3/4 hang off it (reset + re-adopt if those floors don't return); duplicate dhcpd on pfSense;
+  Cox modem / WAN.
+- Open PRE-FLIGHT TODO (unverified): confirm pfSense + core/PoE switches are on the BATTERY side of the UPS
+  (pfSense was surge-only on 6/17 until Mike moved it) — John/onsite to confirm.
+- Optional: stand up a poll/loop ~08:55 to watch for WAN + controller recovery (offered, not yet set).
+
+## Reference Information
+
+- Runbook: clients/cascades-tucson/docs/runbooks/2026-06-23-planned-power-outage.md
+- Incident basis (6/17 unplanned outage): clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md
+- pfSense access: `.claude/skills/unifi-wifi/scripts/pfsense-ssh.sh cascades-tucson run "<cmd>"`
+- RMM auth bootstrap: `eval "$(bash .claude/scripts/rmm-auth.sh)"` -> $TOKEN, $RMM, $REPO_ROOT
+- Standing rule: no Cascades prod change without explicit per-change go (memory feedback_cascades #4).
--- a/clients/dataforth/session-logs/2026-06/2026-06-23-howard-dataforth-share-plan-recovery.md
+++ b/clients/dataforth/session-logs/2026-06/2026-06-23-howard-dataforth-share-plan-recovery.md
@@ -0,0 +1,84 @@
+# Dataforth — Share Plan Context Recovery
+
+## User
+- **User:** Howard Enos (howard)
+- **Machine:** Howard-Home
+- **Role:** tech
+
+## Session Summary
+
+Context-recovery session. The operator asked to bring back up the Dataforth shared-drives /
+permissions plan that had been worked in a prior window. Located the project at
+`clients/dataforth/docs/projects/shares-permissions/` and read the working state to brief
+the operator on where things stand.
+
+The two files touched most recently (2026-06-22 18:55, captured by the 18:54 auto-sync commit
+86c789a) are the active deliverables: `target-structure-draft-2026-06-22.md` (internal Phase 2
+strawman) and `Dataforth-Shared-Drives-Plan.html` (simplified client-facing render). Confirmed
+nothing is uncommitted in `clients/dataforth/`.
+
+Summarized the project for the operator: it moves Dataforth from "every share open to every
+employee" (Everyone/Domain Users, Full on 4 of 8 shares — payroll/OSHA/POs/financials exposed,
+post-2025 ransomware) to a least-privilege department-based AD security-group model with a
+restricted branch, ABE on, excluding the DOS/datasheet/Sage infra shares. Phase 0 (discovery)
+done; Phase 1 (client input) is the blocking gate; Phase 2 target design is drafted (today's
+strawman) pending the client matrix. No file changes were made this session — read/brief only.
+
+## Key Decisions
+
+- No edits made — session was scoped to recovering and reporting state, not advancing the plan.
+- Identified the three candidate next steps to offer the operator: polish the client-facing HTML,
+  finalize/send the discovery email to unblock Phase 1, or refine the internal strawman.
+
+## Problems Encountered
+
+- None.
+
+## Configuration Changes
+
+- None. (This session log is the only file written.)
+
+## Credentials & Secrets
+
+- None surfaced or created.
+
+## Infrastructure & Servers
+
+Referenced from the plan (not modified): AD1, AD2, FILES-D1, SAGE-SQL file servers. Eight
+business shares (c-drive/Q, sage/S, e-drive/T, sales/W, archive/Y, Engineering/B, plus itsvc,
+webshare/X, test). App/infra shares excluded from the dept model: `test` (DOS/SMB1 guest),
+`webshare` (preserve `svc_testdatadb`), `ITSvc`, Sage app paths, NETLOGON/SYSVOL.
+
+## Commands & Outputs
+
+- `git log --oneline -- clients/dataforth/docs/projects/shares-permissions/` → last commit
+  86c789a (auto-sync 2026-06-22 18:54:25); prior 72e0e0a (2026-06-10).
+- `git status --short clients/dataforth/` → clean.
+
+## Pending / Incomplete Tasks
+
+Phase 1 (client input) is BLOCKING. Still needed from Dataforth before Phase 2 sign-off:
+1. Confirm the inferred department list.
+2. Department -> share access matrix (RW/RO/none per area).
+3. Sensitive-data named access (Payroll, OSHA, Purchase Orders, Accounting/Sage).
+4. Department rosters (to populate AD groups).
+5. Legacy-cleanup approval (person-named / "Do not use" folders archive vs delete).
+6. Engineering destination volume — AD1 C: ~90% full, blocks any ENGR restructure.
+
+Email logistics not locked: `discovery-email-draft.md` exists but recipients/sender unset
+(Dan Center primary; CC Kevin Wackerly?; Mike or Howard sending?).
+
+Next-step options offered to operator: (a) polish client-facing HTML, (b) finalize + send
+discovery email to unblock Phase 1, (c) refine internal strawman.
+
+## Reference Information
+
+- Project dir: `clients/dataforth/docs/projects/shares-permissions/`
+- Active strawman: `target-structure-draft-2026-06-22.md`
+- Client deliverable (HTML): `Dataforth-Shared-Drives-Plan.html` (3 sections: folder layout,
+  who gets access, what we need from you)
+- Older docx deliverable: `Dataforth-Shared-Drives-Reorganization-Plan.docx` (2026-06-18)
+- Roadmap: `roadmap.md` (Phase 0 done; Phase 1 pending)
+- Baseline: `current-state-2026-06-10.md`, `acl-audit-detail-2026-06-10.md`
+- Client contact: Dan Center (primary IT). Owner: ACG (Howard).
+- Last commit before this session: 86c789a.
--- a/errorlog.md
+++ b/errorlog.md
@@ -17,6 +17,10 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure ·

 <!-- Append entries below this line -->

+2026-06-23 | Howard-Home | unifi-wifi/pfsense-ssh | SSH connect/auth failed (rc=255) [ctx: host=192.168.0.1:22 slug=cascades-tucson act=run]
+
+2026-06-23 | Howard-Home | bash/json-test-data | [friction] Git-Bash heredoc (even quoted <<'EOF') wrote C: as single backslash -> invalid JSON -> PS engine threw 'Unrecognized escape sequence' exit 3; fix: build JSON test files via PowerShell ConvertTo-Json, not bash heredocs [ctx: ref=feedback_tmp_path_windows]
+
 2026-06-23 | Howard-Home | gururmm/uninstall-engine | [friction] live-tested with -List-shaped targets (which include install_location) -> masked a StrictMode crash that only occurs with the server's UninstallTarget shape (no install_location); always re-test the destructive path with the ACTUAL caller/serialized shape

 2026-06-22 | Howard-Home | gururmm/uninstall-engine | [correction] assumed AnyDesk needs remote removal; it has UninstallString '...AnyDesk.exe --uninstall' and supports --silent, so it is silently removable -- added vendor rule