sync: auto-sync from GURU-5070 at 2026-05-25 12:58:05
Author: Mike Swanson Machine: GURU-5070 Timestamp: 2026-05-25 12:58:05
This commit is contained in:
@@ -1088,3 +1088,115 @@ if let (Some(version), Some(arch)) = (
|
||||
- 12:25 PT - Final compilation successful on Saturn
|
||||
- 12:40 PT - Session log written, ready to sync
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Update: 12:55 PT — Dataforth ESXi License Recovery + Syncro Emergency Billing Skill
|
||||
|
||||
### User
|
||||
- **User:** Mike Swanson (mike)
|
||||
- **Machine:** GURU-5070
|
||||
- **Role:** admin
|
||||
- **Session span:** ~2026-05-24 evening – 2026-05-25 afternoon
|
||||
|
||||
### Session Summary
|
||||
|
||||
Session began as an emergency response: John Lehman texted after hours reporting VPN was down. Investigation via SSH (through D2TESTNAS at 192.168.0.9 as jump host) revealed AD1 and AD2 were offline because ESXi-122's 60-day evaluation license had expired, taking all VMs with it. ESXi-124 was also at risk. SSH was not running on ESXi-122, requiring DCUI physical console access to enable it first.
|
||||
|
||||
License recovery on ESXi-122 was accomplished by copying the hidden backup license file (`/etc/vmware/.#license.cfg`) over the active `license.cfg`, then restarting hostd. This resets the 60-day evaluation timer. ESXi-124 was treated preemptively with the same procedure. After license restoration, all four VMs on ESXi-122 (AD1, AD2, FILES-D1, PBX) were powered on. Both ESXi hosts were configured with a persistent monthly cron job (first Sunday of each month at 02:00) to auto-reset the license and reboot, written directly to `/var/spool/cron/crontabs/root` via paramiko SFTP and persisted through `/etc/rc.local.d/local.sh` since ESXi's filesystem is RAM-based.
|
||||
|
||||
A Syncro ticket was created (#32320) for the incident. The session then shifted to building out emergency/afterhours billing rules as a skill file (`syncro-emergency-billing.md`), researching Winter's historical tickets to establish the correct billing pattern. The key finding: block customers (Dataforth, VWP, Cascades) require two line items on the standard product (actual hours + 0.5x labeled "Afterhours rate") because block accounts track hours not dollars; non-block customers use a single dedicated emergency product (26184, $262.50/hr).
|
||||
|
||||
Adding labor to the Dataforth ticket required discovering the correct Syncro API endpoint through trial and error — `/tickets/{id}/add_line_item` (not `/line_item`, `/line_items`, or top-level endpoints). Experimented on ACG internal test ticket #32321 to confirm payload format before touching the real ticket. Once confirmed, added 2.0hr main labor + 1.0hr afterhours premium to ticket #32320, then deleted the test ticket. The skill was then audited: live product rate fetch revealed two rate errors in the original draft ($150/hr not $175 for Remote Business and In-Shop Business), residential rates were removed as legacy, and the confirmed API method was documented with all required fields.
|
||||
|
||||
### Key Decisions
|
||||
|
||||
- **ESXi crontab via SFTP, not shell**: ESXi has no `crontab` command. Wrote directly to `/var/spool/cron/crontabs/root` via paramiko SFTP; sent SIGHUP to crond after. Shell-based approaches (echo/heredoc) were tried first and failed.
|
||||
- **local.sh persistence in Python, not shell**: `grep -c` through a shell command produced "0\n0" (grep output + fallback), causing false-positive match detection. Rewrote local.sh update logic using SFTP read/write in Python to avoid shell quoting/output ambiguity.
|
||||
- **Test before touching real ticket**: Rather than guessing the Syncro line item payload format and hitting the real Dataforth ticket, opened a test ticket on ACG internal customer to confirm endpoint and required fields first.
|
||||
- **Both `name` and `description` required**: Syncro's `add_line_item` endpoint returns 422 if either field is missing — not obvious from the API name. Documented explicitly.
|
||||
- **Live rate fetch mandatory**: Memory note confirmed rates had been wrong before (2026-05-20 incident). Fetched all product rates live before finalizing the skill; found Remote Business ($150) and In-Shop Business ($150) were both documented as $175 in the original draft.
|
||||
- **$262.50 emergency product covers all business work**: Confirmed with Mike — no distinction between remote and onsite emergency. One product for all business emergency billing regardless of service delivery method.
|
||||
- **Residential rates are legacy**: Removed 42584 and 1190471 from all active sections of the skill; added to "Products NOT to Use."
|
||||
|
||||
### Problems Encountered
|
||||
|
||||
- **SSH not enabled on ESXi-122**: License expiration locks out management — had to enable SSH via DCUI physical console before remote work was possible. No automated fix; required hands-on at the host.
|
||||
- **`crontab` command missing on ESXi**: ESXi busybox environment does not include the `crontab` CLI. Fix: write the crontab file directly via SFTP.
|
||||
- **`grep -c` false positive in local.sh check**: Shell command `grep -c 'pattern' file 2>/dev/null || echo 0` emitted both the grep count and the fallback "0", causing the Python string comparison to see "0\n0" (truthy). Fixed by using SFTP to read and rewrite local.sh entirely in Python.
|
||||
- **Syncro line item endpoint discovery**: No working documentation for the correct path. Tried `/line_item`, `/line_items`, PUT with `line_items_attributes` — all 404. Eventually fetched the Syncro Swagger spec from `api-docs.syncromsp.com/swagger.json` and found `add_line_item`.
|
||||
- **422 on add_line_item with only `name` field**: Both `name` and `description` are required; omitting either returns 422.
|
||||
|
||||
### Configuration Changes
|
||||
|
||||
- **Created:** `D:\claudetools\.claude\commands\syncro-emergency-billing.md` — Emergency/afterhours billing skill for Syncro (rules, billing scenarios, confirmed API method)
|
||||
- **Modified:** `syncro-emergency-billing.md` — Rate corrections (Remote Business $150, In-Shop $150), residential removed as legacy, API section added
|
||||
- **ESXi-122** (`192.168.0.122`): license.cfg restored, cron job written, local.sh updated, all VMs powered on
|
||||
- **ESXi-124** (`192.168.0.124`): license.cfg restored preemptively, cron job written, local.sh updated
|
||||
|
||||
### Credentials & Secrets
|
||||
|
||||
- **D2TESTNAS (jump host):** `192.168.0.9` — root / `Paper123!@#`
|
||||
- **ESXi root password (both hosts):** `Gptf*77ttb!@#!@#`
|
||||
- **Syncro API key:** `T259810e5c9917386b-52c2aeea7cdb5ff41c6685a73cebbeb3` — vault: `msp-tools/syncro.sops.yaml` → `credentials.credential`
|
||||
|
||||
### Infrastructure & Servers
|
||||
|
||||
| Host | IP | Role | Notes |
|
||||
|---|---|---|---|
|
||||
| D2TESTNAS | 192.168.0.9 | Jump host / NAS | SSH root access; used as paramiko jump for ESXi |
|
||||
| ESXi-122 | 192.168.0.122 | Hypervisor | Datastore: `datastore1`; hosts AD1, AD2, FILES-D1, PBX |
|
||||
| ESXi-124 | 192.168.0.124 | Hypervisor | Datastore: `Backup`; treated preemptively |
|
||||
| AD1 | (on ESXi-122) | Domain Controller | Was offline due to license expiry; restored |
|
||||
| AD2 | (on ESXi-122) | Domain Controller | Was offline; restored |
|
||||
| FILES-D1 | (on ESXi-122) | File server | Was offline; restored |
|
||||
| PBX | (on ESXi-122) | Phone system | Was offline; restored |
|
||||
|
||||
ESXi license reset script locations:
|
||||
- ESXi-122: `/vmfs/volumes/datastore1/license_reset.sh`
|
||||
- ESXi-124: `/vmfs/volumes/Backup/license_reset.sh`
|
||||
|
||||
Cron schedule (both hosts): `0 2 * * 0 [ $(date +%d) -le 7 ] && <script> >> /tmp/license_reset.log 2>&1`
|
||||
Persistence: `/etc/rc.local.d/local.sh` — restores crontab entry on each boot.
|
||||
|
||||
### Commands & Outputs
|
||||
|
||||
```bash
|
||||
# ESXi license reset (run on each host via SSH)
|
||||
cp /etc/vmware/.#license.cfg /etc/vmware/license.cfg
|
||||
/etc/init.d/hostd restart
|
||||
|
||||
# Verify license state
|
||||
vim-cmd vimsvc/license --show | grep -E 'serial|diagnostic|expirationHours'
|
||||
|
||||
# Add line item to existing Syncro ticket (confirmed working 2026-05-25)
|
||||
curl -s -X POST "https://computerguru.syncromsp.com/api/v1/tickets/{ticket_id}/add_line_item" \
|
||||
-H "Authorization: <api_key>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"product_id":1190473,"name":"Labor - Remote Business","description":"Work description","quantity":2.0,"price":0.0,"taxable":false}'
|
||||
|
||||
# Fetch live product rate before billing non-block
|
||||
curl -s "https://computerguru.syncromsp.com/api/v1/products/{product_id}" \
|
||||
-H "Authorization: <api_key>" | jq '.product.price_retail'
|
||||
```
|
||||
|
||||
Dataforth ticket #32320 (ID: 110958232) — line items added:
|
||||
- ID 42571127: Labor - Remote Business, 2.0 hr, "Afterhours remote — John Lehman reported VPN down..."
|
||||
- ID 42571130: Labor - Remote Business, 1.0 hr, "Afterhours rate"
|
||||
|
||||
### Pending / Incomplete Tasks
|
||||
|
||||
None. Ticket is complete, skill is complete, ESXi cron is configured and persistent.
|
||||
|
||||
### Reference Information
|
||||
|
||||
- **Syncro ticket:** #32320 (ID: 110958232) — "Afterhours - VMware ESXi - Evaluation License Expired / VMs Down" — Dataforth Corporation
|
||||
- **Syncro test ticket deleted:** #32321 (ID: 110961873) — ACG internal customer
|
||||
- **Reference invoice:** 67594 (VWP block customer emergency billing example, 2026-05-12)
|
||||
- **Reference ticket:** #32269 (VWP, block emergency billing reference)
|
||||
- **Syncro add_line_item endpoint:** `POST /api/v1/tickets/{id}/add_line_item`
|
||||
- **Syncro product IDs:** 1190473 (Remote Business $150), 26118 (Onsite $175), 573881 (In-Shop $150), 26184 (Emergency Business $262.50)
|
||||
- **Python scripts (Temp):**
|
||||
- `C:\Users\guru\AppData\Local\Temp\esxi_schedule_monthly_reset_v2.py` — final cron setup script (SFTP method)
|
||||
- `C:\Users\guru\AppData\Local\Temp\esxi_schedule_monthly_reset.py` — v1 (heredoc method, superseded)
|
||||
- `C:\Users\guru\AppData\Local\Temp\esxi124_hostd_restart.py` — hostd restart + verification
|
||||
|
||||
Reference in New Issue
Block a user