harness: fleet-wide functional-error + correction + friction logging

Add .claude/scripts/log-skill-error.sh — the canonical agent error log helper (writes errorlog.md in DATE | MACHINE | skill | [type] error format, soft-fails). Three categories: execution failures (default), user corrections (--correction), and preventable self-inflicted friction (--friction; cite ref= when it repeats a documented gotcha). Goal: stop paying tokens twice for the same avoidable mistake. - CLAUDE.md: make logging mandatory for all skills + corrections + friction. - skill-creator: new skills must wire in the helper (guidance + checklist). - Retrofit every skill script's genuine failure branches to call the helper (b2/bitdefender/mailprotector/packetdial/coord python CLIs; remediation-tool + onboard365 bash; vault, rmm-auth, post-bot-alert, agy, grok, 1password, run-onboarding-diagnostic). Handled conditions + self-tests left alone. - errorlog.md: broaden header to cover skills + harness + corrections; seed this session's corrections (INKY, Mail.Send token-audience, omnibox-strictness) and friction (git-bash /tmp, env-persistence, argv-limit, PowerShell var-case). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 11:39:43 -07:00
parent 927a06a0cf
commit 9960da5f9a
29 changed files with 388 additions and 36 deletions
--- a/.claude/skills/skill-creator/SKILL.md
+++ b/.claude/skills/skill-creator/SKILL.md
@@ -98,6 +98,23 @@ After creating the files:
   - Commands: Tell them to use `/{name}` or `/{name} arguments`
 3. Remind them to update CLAUDE.md's Commands & Skills table if they want it documented there

+## Mandatory: functional error logging
+
+**Every skill MUST report genuine functional errors to `errorlog.md`** via the canonical
+helper, so failures can be linted and fed back into skill improvements (CLAUDE.md core rule).
+Bake this into the skill from the start:
+
+- In each skill **script's failure branches** (API/auth failure, unexpected response,
+  validation error, unexpected non-zero exit), call:
+  ```bash
+  bash "$ROOT/.claude/scripts/log-skill-error.sh" "<skill-name>" "<brief error>" --context "op=... http=..."
+  ```
+  It stamps date+machine, inserts in the standard `YYYY-MM-DD | MACHINE | skill | error`
+  format, and soft-fails (never breaks the caller). Python skills shell out to the same helper.
+- In the **SKILL.md**, add a line under workflow/guidelines: "On a functional error, log it via
+  `log-skill-error.sh` before surfacing it."
+- Do NOT log expected/handled conditions (no results, no unread, user-declined) — only real failures.
+
 ## Quality Checklist

 Before finalizing, verify:
@@ -107,6 +124,7 @@ Before finalizing, verify:
 - [ ] File is in the correct location (`.claude/skills/` or `.claude/commands/`)
 - [ ] Name uses kebab-case and is concise
 - [ ] For skills with auto-triggers: triggers are specific enough to avoid false positives
+- [ ] **Functional errors are logged** via `log-skill-error.sh` in the script's failure branches

 ## Tips for Good Skills/Commands