From 7fb32ba349c1c1b4760f888288568d9e12d68da4 Mon Sep 17 00:00:00 2001 From: Howard Enos Date: Tue, 23 Jun 2026 05:36:38 -0700 Subject: [PATCH] sync: auto-sync from HOWARD-HOME at 2026-06-23 05:36:06 Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-06-23 05:36:06 --- ...cascades-planned-outage-shutdown-verify.md | 105 ++++++++++++++++++ ...23-howard-dataforth-share-plan-recovery.md | 84 ++++++++++++++ errorlog.md | 4 + 3 files changed, 193 insertions(+) create mode 100644 clients/cascades-tucson/session-logs/2026-06/2026-06-23-howard-cascades-planned-outage-shutdown-verify.md create mode 100644 clients/dataforth/session-logs/2026-06/2026-06-23-howard-dataforth-share-plan-recovery.md diff --git a/clients/cascades-tucson/session-logs/2026-06/2026-06-23-howard-cascades-planned-outage-shutdown-verify.md b/clients/cascades-tucson/session-logs/2026-06/2026-06-23-howard-cascades-planned-outage-shutdown-verify.md new file mode 100644 index 00000000..eeec96c0 --- /dev/null +++ b/clients/cascades-tucson/session-logs/2026-06/2026-06-23-howard-cascades-planned-outage-shutdown-verify.md @@ -0,0 +1,105 @@ +## User +- **User:** Howard Enos (howard) +- **Machine:** Howard-Home +- **Role:** tech + +## Session Summary + +Resumed the Cascades of Tucson planned-power-outage work at fire time on 2026-06-23. +The building had a scheduled electrical cut from 05:30-09:00 MST. The shutdown sequence +had been authored, automated, and armed the prior evening (2026-06-22 ~19:06) per the +runbook `clients/cascades-tucson/docs/runbooks/2026-06-23-planned-power-outage.md`; this +session was the fire-time verification that the arms executed cleanly. + +Pulled the runbook back up to re-establish context. The plan: graceful self-shutdowns +on all three core devices (CS-SERVER, Synology, pfSense) so nothing takes a dirty +power-loss — critical because CS-SERVER runs a DEGRADED RAID-1 OS mirror on a single old +spindle, the worst-case box for an unclean cut. Each device was armed with a self-contained +local schedule so it fires independently of the Howard-Home session and the OpenVPN tunnel. + +Attempted the "re-verify the arms" step but discovered via timezone math (UTC 12:30 = +05:30 MST, Phoenix UTC-7 no DST) that we were already AT fire time — past the +"verify before 05:28" window. The shutdowns had already executed. Confirmed clean +execution through the one independent channel that survives the site going dark: GuruRMM +cloud. CS-SERVER reported offline with last_seen 2026-06-23T12:29:49Z (05:29:49 MST) — +~1.5 min after its 05:28 scheduled task, exactly the expected graceful-shutdown lag +(stop CS-QB VM -> wait 25s -> Stop-Computer -Force). pfSense SSH was unreachable (exit 255), +consistent with it being the gateway/VPN endpoint and powering off. Synology had no remote +path (it sits behind the pfSense tunnel, now down) but was armed and verified the prior night. + +Net result: the worst-case data-integrity risk (CS-SERVER) is positively confirmed handled. +We are now in the dark window with no remote visibility into the site by design — pfSense is +the OpenVPN endpoint, so once it powers off all in-site paths are gone; only RMM cloud reaches +CS-SERVER. Nothing further to send until the ~09:00 bring-up. + +## Key Decisions + +- Did NOT attempt to re-arm or re-send shutdown commands once timezone math showed we were past + fire time — the arms had already fired and re-sending would be pointless/harmful. +- Treated the pfSense SSH exit-255 and the Synology no-path as EXPECTED (not errors): with pfSense + down, the VPN endpoint is gone, so loss of in-site reachability is the designed outcome, not a fault. +- Relied on GuruRMM cloud as the single authoritative confirmation channel for CS-SERVER (the only box + with an out-of-band path), since it is the highest-risk device (degraded RAID-1 mirror). +- Read-only checks only -> no #dev-alerts bot post (per rmm skill rule: alerts on writes only). + +## Problems Encountered + +- `pfsense-ssh.sh cascades-tucson run ...` returned exit 255 (SSH connect fail). Initially ambiguous; + resolved by checking current time — it was 05:30:57 MST, i.e. pfSense had hit its 05:30 shutdown and/or + the building cut had landed. Expected, not a fault. +- Git-Bash `TZ='America/Phoenix' date` printed GMT (no tzdata on this MSYS build). Worked around by + computing MST manually from UTC (Phoenix = UTC-7, no DST): 12:30 UTC = 05:30 MST. + +## Configuration Changes + +- None. Read-only verification session. No files modified on endpoints or in the repo (this log only). + +## Credentials & Secrets + +- None discovered, created, or rotated. RMM auth via existing vault path + `infrastructure/gururmm-server.sops.yaml` (admin-email / admin-password). pfSense via + `clients/cascades-tucson/pfsense-firewall`. No new secrets. + +## Infrastructure & Servers + +- CS-SERVER: 192.168.2.254 (Windows), GuruRMM agent c39f1de7-d5b6-45ae-b132-e06977ab1713, agent ver 0.6.66. + iDRAC 192.168.2.65. Confirmed offline last_seen 2026-06-23T12:29:49Z. +- Synology (CascadesDS): 192.168.0.120, DSM http://192.168.0.120:5000, vault + clients/cascades-tucson/synology-cascadesds. +- pfSense: 192.168.0.1 (gateway + OpenVPN endpoint), pfSense Plus 25.07. +- UOS controller: https://172.16.3.29:11443, site va6iba3v / 685f39068e65331c46ef6dd2. +- GuruRMM API: http://172.16.3.30:3001. +- Outage window: 05:30-09:00 MST, 2026-06-23. Onsite: John Trozzi (physical power-on ~09:00). + +## Commands & Outputs + +- Time check: `date -u` -> 2026-06-23 12:30:57 UTC = 05:30:57 MST (Phoenix UTC-7). +- pfSense arm re-verify: `pfsense-ssh.sh cascades-tucson run 'ps -axww | grep ... shutdown -p now'` + -> exit 255 (unreachable; expected — box down). +- CS-SERVER status (RMM): `curl $RMM/api/agents | jq 'select(.hostname=="CS-SERVER")'` + -> `status=offline connected=null last_seen=2026-06-23T12:29:49.196769Z ver=0.6.66`. + +## Pending / Incomplete Tasks + +- BRING-UP at ~09:00 MST (John presses buttons, Howard monitors), bottom-up: + 1. pfSense first — verify SINGLE dhcpd (`pgrep -f "dhcpd -user" | wc -l` -> 1), WAN up + (dpinger WAN_DHCP + WANCOAX_DHCP healthy). If WAN does NOT establish -> REBOOT THE COX MODEM + (the missed post-restore step from the 6/17 unplanned outage). + 2. Switches/APs re-adopt — watch UOS controller until 12/12 switches + 77/77 APs connected. + 3. CS-SERVER boots -> verify AD/DNS, DHCP role, Hyper-V VMs (CS-QB), file + print shares. Then Synology. + 4. Straggler sweep: power-cycle printers/POS/IoT that cached a disconnected state. KNOWN: kitchen + thermal printer (iPad POS ticket printer) — invisible to DHCP scan, fix by power-cycle. +- WATCH-LIST (6/17 casualties): Switch 2nd Floor #2 (USL24PB, 192.168.2.193) — one-way L2 break last time, + floors 2/3/4 hang off it (reset + re-adopt if those floors don't return); duplicate dhcpd on pfSense; + Cox modem / WAN. +- Open PRE-FLIGHT TODO (unverified): confirm pfSense + core/PoE switches are on the BATTERY side of the UPS + (pfSense was surge-only on 6/17 until Mike moved it) — John/onsite to confirm. +- Optional: stand up a poll/loop ~08:55 to watch for WAN + controller recovery (offered, not yet set). + +## Reference Information + +- Runbook: clients/cascades-tucson/docs/runbooks/2026-06-23-planned-power-outage.md +- Incident basis (6/17 unplanned outage): clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md +- pfSense access: `.claude/skills/unifi-wifi/scripts/pfsense-ssh.sh cascades-tucson run ""` +- RMM auth bootstrap: `eval "$(bash .claude/scripts/rmm-auth.sh)"` -> $TOKEN, $RMM, $REPO_ROOT +- Standing rule: no Cascades prod change without explicit per-change go (memory feedback_cascades #4). diff --git a/clients/dataforth/session-logs/2026-06/2026-06-23-howard-dataforth-share-plan-recovery.md b/clients/dataforth/session-logs/2026-06/2026-06-23-howard-dataforth-share-plan-recovery.md new file mode 100644 index 00000000..269c33cf --- /dev/null +++ b/clients/dataforth/session-logs/2026-06/2026-06-23-howard-dataforth-share-plan-recovery.md @@ -0,0 +1,84 @@ +# Dataforth — Share Plan Context Recovery + +## User +- **User:** Howard Enos (howard) +- **Machine:** Howard-Home +- **Role:** tech + +## Session Summary + +Context-recovery session. The operator asked to bring back up the Dataforth shared-drives / +permissions plan that had been worked in a prior window. Located the project at +`clients/dataforth/docs/projects/shares-permissions/` and read the working state to brief +the operator on where things stand. + +The two files touched most recently (2026-06-22 18:55, captured by the 18:54 auto-sync commit +86c789a) are the active deliverables: `target-structure-draft-2026-06-22.md` (internal Phase 2 +strawman) and `Dataforth-Shared-Drives-Plan.html` (simplified client-facing render). Confirmed +nothing is uncommitted in `clients/dataforth/`. + +Summarized the project for the operator: it moves Dataforth from "every share open to every +employee" (Everyone/Domain Users, Full on 4 of 8 shares — payroll/OSHA/POs/financials exposed, +post-2025 ransomware) to a least-privilege department-based AD security-group model with a +restricted branch, ABE on, excluding the DOS/datasheet/Sage infra shares. Phase 0 (discovery) +done; Phase 1 (client input) is the blocking gate; Phase 2 target design is drafted (today's +strawman) pending the client matrix. No file changes were made this session — read/brief only. + +## Key Decisions + +- No edits made — session was scoped to recovering and reporting state, not advancing the plan. +- Identified the three candidate next steps to offer the operator: polish the client-facing HTML, + finalize/send the discovery email to unblock Phase 1, or refine the internal strawman. + +## Problems Encountered + +- None. + +## Configuration Changes + +- None. (This session log is the only file written.) + +## Credentials & Secrets + +- None surfaced or created. + +## Infrastructure & Servers + +Referenced from the plan (not modified): AD1, AD2, FILES-D1, SAGE-SQL file servers. Eight +business shares (c-drive/Q, sage/S, e-drive/T, sales/W, archive/Y, Engineering/B, plus itsvc, +webshare/X, test). App/infra shares excluded from the dept model: `test` (DOS/SMB1 guest), +`webshare` (preserve `svc_testdatadb`), `ITSvc`, Sage app paths, NETLOGON/SYSVOL. + +## Commands & Outputs + +- `git log --oneline -- clients/dataforth/docs/projects/shares-permissions/` → last commit + 86c789a (auto-sync 2026-06-22 18:54:25); prior 72e0e0a (2026-06-10). +- `git status --short clients/dataforth/` → clean. + +## Pending / Incomplete Tasks + +Phase 1 (client input) is BLOCKING. Still needed from Dataforth before Phase 2 sign-off: +1. Confirm the inferred department list. +2. Department -> share access matrix (RW/RO/none per area). +3. Sensitive-data named access (Payroll, OSHA, Purchase Orders, Accounting/Sage). +4. Department rosters (to populate AD groups). +5. Legacy-cleanup approval (person-named / "Do not use" folders archive vs delete). +6. Engineering destination volume — AD1 C: ~90% full, blocks any ENGR restructure. + +Email logistics not locked: `discovery-email-draft.md` exists but recipients/sender unset +(Dan Center primary; CC Kevin Wackerly?; Mike or Howard sending?). + +Next-step options offered to operator: (a) polish client-facing HTML, (b) finalize + send +discovery email to unblock Phase 1, (c) refine internal strawman. + +## Reference Information + +- Project dir: `clients/dataforth/docs/projects/shares-permissions/` +- Active strawman: `target-structure-draft-2026-06-22.md` +- Client deliverable (HTML): `Dataforth-Shared-Drives-Plan.html` (3 sections: folder layout, + who gets access, what we need from you) +- Older docx deliverable: `Dataforth-Shared-Drives-Reorganization-Plan.docx` (2026-06-18) +- Roadmap: `roadmap.md` (Phase 0 done; Phase 1 pending) +- Baseline: `current-state-2026-06-10.md`, `acl-audit-detail-2026-06-10.md` +- Client contact: Dan Center (primary IT). Owner: ACG (Howard). +- Last commit before this session: 86c789a. diff --git a/errorlog.md b/errorlog.md index bedaf82f..4f51006f 100644 --- a/errorlog.md +++ b/errorlog.md @@ -17,6 +17,10 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure · +2026-06-23 | Howard-Home | unifi-wifi/pfsense-ssh | SSH connect/auth failed (rc=255) [ctx: host=192.168.0.1:22 slug=cascades-tucson act=run] + +2026-06-23 | Howard-Home | bash/json-test-data | [friction] Git-Bash heredoc (even quoted <<'EOF') wrote C: as single backslash -> invalid JSON -> PS engine threw 'Unrecognized escape sequence' exit 3; fix: build JSON test files via PowerShell ConvertTo-Json, not bash heredocs [ctx: ref=feedback_tmp_path_windows] + 2026-06-23 | Howard-Home | gururmm/uninstall-engine | [friction] live-tested with -List-shaped targets (which include install_location) -> masked a StrictMode crash that only occurs with the server's UninstallTarget shape (no install_location); always re-test the destructive path with the ACTUAL caller/serialized shape 2026-06-22 | Howard-Home | gururmm/uninstall-engine | [correction] assumed AnyDesk needs remote removal; it has UninstallString '...AnyDesk.exe --uninstall' and supports --silent, so it is silently removable -- added vendor rule