sync: auto-sync from HOWARD-HOME at 2026-06-23 05:36:06

Author: Howard Enos
Machine: HOWARD-HOME
Timestamp: 2026-06-23 05:36:06
This commit is contained in:
2026-06-23 05:36:38 -07:00
parent 56bf6f44ff
commit 7fb32ba349
3 changed files with 193 additions and 0 deletions

View File

@@ -0,0 +1,105 @@
## User
- **User:** Howard Enos (howard)
- **Machine:** Howard-Home
- **Role:** tech
## Session Summary
Resumed the Cascades of Tucson planned-power-outage work at fire time on 2026-06-23.
The building had a scheduled electrical cut from 05:30-09:00 MST. The shutdown sequence
had been authored, automated, and armed the prior evening (2026-06-22 ~19:06) per the
runbook `clients/cascades-tucson/docs/runbooks/2026-06-23-planned-power-outage.md`; this
session was the fire-time verification that the arms executed cleanly.
Pulled the runbook back up to re-establish context. The plan: graceful self-shutdowns
on all three core devices (CS-SERVER, Synology, pfSense) so nothing takes a dirty
power-loss — critical because CS-SERVER runs a DEGRADED RAID-1 OS mirror on a single old
spindle, the worst-case box for an unclean cut. Each device was armed with a self-contained
local schedule so it fires independently of the Howard-Home session and the OpenVPN tunnel.
Attempted the "re-verify the arms" step but discovered via timezone math (UTC 12:30 =
05:30 MST, Phoenix UTC-7 no DST) that we were already AT fire time — past the
"verify before 05:28" window. The shutdowns had already executed. Confirmed clean
execution through the one independent channel that survives the site going dark: GuruRMM
cloud. CS-SERVER reported offline with last_seen 2026-06-23T12:29:49Z (05:29:49 MST) —
~1.5 min after its 05:28 scheduled task, exactly the expected graceful-shutdown lag
(stop CS-QB VM -> wait 25s -> Stop-Computer -Force). pfSense SSH was unreachable (exit 255),
consistent with it being the gateway/VPN endpoint and powering off. Synology had no remote
path (it sits behind the pfSense tunnel, now down) but was armed and verified the prior night.
Net result: the worst-case data-integrity risk (CS-SERVER) is positively confirmed handled.
We are now in the dark window with no remote visibility into the site by design — pfSense is
the OpenVPN endpoint, so once it powers off all in-site paths are gone; only RMM cloud reaches
CS-SERVER. Nothing further to send until the ~09:00 bring-up.
## Key Decisions
- Did NOT attempt to re-arm or re-send shutdown commands once timezone math showed we were past
fire time — the arms had already fired and re-sending would be pointless/harmful.
- Treated the pfSense SSH exit-255 and the Synology no-path as EXPECTED (not errors): with pfSense
down, the VPN endpoint is gone, so loss of in-site reachability is the designed outcome, not a fault.
- Relied on GuruRMM cloud as the single authoritative confirmation channel for CS-SERVER (the only box
with an out-of-band path), since it is the highest-risk device (degraded RAID-1 mirror).
- Read-only checks only -> no #dev-alerts bot post (per rmm skill rule: alerts on writes only).
## Problems Encountered
- `pfsense-ssh.sh cascades-tucson run ...` returned exit 255 (SSH connect fail). Initially ambiguous;
resolved by checking current time — it was 05:30:57 MST, i.e. pfSense had hit its 05:30 shutdown and/or
the building cut had landed. Expected, not a fault.
- Git-Bash `TZ='America/Phoenix' date` printed GMT (no tzdata on this MSYS build). Worked around by
computing MST manually from UTC (Phoenix = UTC-7, no DST): 12:30 UTC = 05:30 MST.
## Configuration Changes
- None. Read-only verification session. No files modified on endpoints or in the repo (this log only).
## Credentials & Secrets
- None discovered, created, or rotated. RMM auth via existing vault path
`infrastructure/gururmm-server.sops.yaml` (admin-email / admin-password). pfSense via
`clients/cascades-tucson/pfsense-firewall`. No new secrets.
## Infrastructure & Servers
- CS-SERVER: 192.168.2.254 (Windows), GuruRMM agent c39f1de7-d5b6-45ae-b132-e06977ab1713, agent ver 0.6.66.
iDRAC 192.168.2.65. Confirmed offline last_seen 2026-06-23T12:29:49Z.
- Synology (CascadesDS): 192.168.0.120, DSM http://192.168.0.120:5000, vault
clients/cascades-tucson/synology-cascadesds.
- pfSense: 192.168.0.1 (gateway + OpenVPN endpoint), pfSense Plus 25.07.
- UOS controller: https://172.16.3.29:11443, site va6iba3v / 685f39068e65331c46ef6dd2.
- GuruRMM API: http://172.16.3.30:3001.
- Outage window: 05:30-09:00 MST, 2026-06-23. Onsite: John Trozzi (physical power-on ~09:00).
## Commands & Outputs
- Time check: `date -u` -> 2026-06-23 12:30:57 UTC = 05:30:57 MST (Phoenix UTC-7).
- pfSense arm re-verify: `pfsense-ssh.sh cascades-tucson run 'ps -axww | grep ... shutdown -p now'`
-> exit 255 (unreachable; expected — box down).
- CS-SERVER status (RMM): `curl $RMM/api/agents | jq 'select(.hostname=="CS-SERVER")'`
-> `status=offline connected=null last_seen=2026-06-23T12:29:49.196769Z ver=0.6.66`.
## Pending / Incomplete Tasks
- BRING-UP at ~09:00 MST (John presses buttons, Howard monitors), bottom-up:
1. pfSense first — verify SINGLE dhcpd (`pgrep -f "dhcpd -user" | wc -l` -> 1), WAN up
(dpinger WAN_DHCP + WANCOAX_DHCP healthy). If WAN does NOT establish -> REBOOT THE COX MODEM
(the missed post-restore step from the 6/17 unplanned outage).
2. Switches/APs re-adopt — watch UOS controller until 12/12 switches + 77/77 APs connected.
3. CS-SERVER boots -> verify AD/DNS, DHCP role, Hyper-V VMs (CS-QB), file + print shares. Then Synology.
4. Straggler sweep: power-cycle printers/POS/IoT that cached a disconnected state. KNOWN: kitchen
thermal printer (iPad POS ticket printer) — invisible to DHCP scan, fix by power-cycle.
- WATCH-LIST (6/17 casualties): Switch 2nd Floor #2 (USL24PB, 192.168.2.193) — one-way L2 break last time,
floors 2/3/4 hang off it (reset + re-adopt if those floors don't return); duplicate dhcpd on pfSense;
Cox modem / WAN.
- Open PRE-FLIGHT TODO (unverified): confirm pfSense + core/PoE switches are on the BATTERY side of the UPS
(pfSense was surge-only on 6/17 until Mike moved it) — John/onsite to confirm.
- Optional: stand up a poll/loop ~08:55 to watch for WAN + controller recovery (offered, not yet set).
## Reference Information
- Runbook: clients/cascades-tucson/docs/runbooks/2026-06-23-planned-power-outage.md
- Incident basis (6/17 unplanned outage): clients/cascades-tucson/reports/2026-06-17-power-outage-incident.md
- pfSense access: `.claude/skills/unifi-wifi/scripts/pfsense-ssh.sh cascades-tucson run "<cmd>"`
- RMM auth bootstrap: `eval "$(bash .claude/scripts/rmm-auth.sh)"` -> $TOKEN, $RMM, $REPO_ROOT
- Standing rule: no Cascades prod change without explicit per-change go (memory feedback_cascades #4).

View File

@@ -0,0 +1,84 @@
# Dataforth — Share Plan Context Recovery
## User
- **User:** Howard Enos (howard)
- **Machine:** Howard-Home
- **Role:** tech
## Session Summary
Context-recovery session. The operator asked to bring back up the Dataforth shared-drives /
permissions plan that had been worked in a prior window. Located the project at
`clients/dataforth/docs/projects/shares-permissions/` and read the working state to brief
the operator on where things stand.
The two files touched most recently (2026-06-22 18:55, captured by the 18:54 auto-sync commit
86c789a) are the active deliverables: `target-structure-draft-2026-06-22.md` (internal Phase 2
strawman) and `Dataforth-Shared-Drives-Plan.html` (simplified client-facing render). Confirmed
nothing is uncommitted in `clients/dataforth/`.
Summarized the project for the operator: it moves Dataforth from "every share open to every
employee" (Everyone/Domain Users, Full on 4 of 8 shares — payroll/OSHA/POs/financials exposed,
post-2025 ransomware) to a least-privilege department-based AD security-group model with a
restricted branch, ABE on, excluding the DOS/datasheet/Sage infra shares. Phase 0 (discovery)
done; Phase 1 (client input) is the blocking gate; Phase 2 target design is drafted (today's
strawman) pending the client matrix. No file changes were made this session — read/brief only.
## Key Decisions
- No edits made — session was scoped to recovering and reporting state, not advancing the plan.
- Identified the three candidate next steps to offer the operator: polish the client-facing HTML,
finalize/send the discovery email to unblock Phase 1, or refine the internal strawman.
## Problems Encountered
- None.
## Configuration Changes
- None. (This session log is the only file written.)
## Credentials & Secrets
- None surfaced or created.
## Infrastructure & Servers
Referenced from the plan (not modified): AD1, AD2, FILES-D1, SAGE-SQL file servers. Eight
business shares (c-drive/Q, sage/S, e-drive/T, sales/W, archive/Y, Engineering/B, plus itsvc,
webshare/X, test). App/infra shares excluded from the dept model: `test` (DOS/SMB1 guest),
`webshare` (preserve `svc_testdatadb`), `ITSvc`, Sage app paths, NETLOGON/SYSVOL.
## Commands & Outputs
- `git log --oneline -- clients/dataforth/docs/projects/shares-permissions/` → last commit
86c789a (auto-sync 2026-06-22 18:54:25); prior 72e0e0a (2026-06-10).
- `git status --short clients/dataforth/` → clean.
## Pending / Incomplete Tasks
Phase 1 (client input) is BLOCKING. Still needed from Dataforth before Phase 2 sign-off:
1. Confirm the inferred department list.
2. Department -> share access matrix (RW/RO/none per area).
3. Sensitive-data named access (Payroll, OSHA, Purchase Orders, Accounting/Sage).
4. Department rosters (to populate AD groups).
5. Legacy-cleanup approval (person-named / "Do not use" folders archive vs delete).
6. Engineering destination volume — AD1 C: ~90% full, blocks any ENGR restructure.
Email logistics not locked: `discovery-email-draft.md` exists but recipients/sender unset
(Dan Center primary; CC Kevin Wackerly?; Mike or Howard sending?).
Next-step options offered to operator: (a) polish client-facing HTML, (b) finalize + send
discovery email to unblock Phase 1, (c) refine internal strawman.
## Reference Information
- Project dir: `clients/dataforth/docs/projects/shares-permissions/`
- Active strawman: `target-structure-draft-2026-06-22.md`
- Client deliverable (HTML): `Dataforth-Shared-Drives-Plan.html` (3 sections: folder layout,
who gets access, what we need from you)
- Older docx deliverable: `Dataforth-Shared-Drives-Reorganization-Plan.docx` (2026-06-18)
- Roadmap: `roadmap.md` (Phase 0 done; Phase 1 pending)
- Baseline: `current-state-2026-06-10.md`, `acl-audit-detail-2026-06-10.md`
- Client contact: Dan Center (primary IT). Owner: ACG (Howard).
- Last commit before this session: 86c789a.

View File

@@ -17,6 +17,10 @@ Categories (the `[type]` tag): _(none)_ = skill/command execution failure ·
<!-- Append entries below this line -->
2026-06-23 | Howard-Home | unifi-wifi/pfsense-ssh | SSH connect/auth failed (rc=255) [ctx: host=192.168.0.1:22 slug=cascades-tucson act=run]
2026-06-23 | Howard-Home | bash/json-test-data | [friction] Git-Bash heredoc (even quoted <<'EOF') wrote C: as single backslash -> invalid JSON -> PS engine threw 'Unrecognized escape sequence' exit 3; fix: build JSON test files via PowerShell ConvertTo-Json, not bash heredocs [ctx: ref=feedback_tmp_path_windows]
2026-06-23 | Howard-Home | gururmm/uninstall-engine | [friction] live-tested with -List-shaped targets (which include install_location) -> masked a StrictMode crash that only occurs with the server's UninstallTarget shape (no install_location); always re-test the destructive path with the ACTUAL caller/serialized shape
2026-06-22 | Howard-Home | gururmm/uninstall-engine | [correction] assumed AnyDesk needs remote removal; it has UninstallString '...AnyDesk.exe --uninstall' and supports --silent, so it is silently removable -- added vendor rule