harness: fleet-wide functional-error + correction + friction logging

Add .claude/scripts/log-skill-error.sh — the canonical agent error log helper (writes errorlog.md in DATE | MACHINE | skill | [type] error format, soft-fails). Three categories: execution failures (default), user corrections (--correction), and preventable self-inflicted friction (--friction; cite ref= when it repeats a documented gotcha). Goal: stop paying tokens twice for the same avoidable mistake. - CLAUDE.md: make logging mandatory for all skills + corrections + friction. - skill-creator: new skills must wire in the helper (guidance + checklist). - Retrofit every skill script's genuine failure branches to call the helper (b2/bitdefender/mailprotector/packetdial/coord python CLIs; remediation-tool + onboard365 bash; vault, rmm-auth, post-bot-alert, agy, grok, 1password, run-onboarding-diagnostic). Handled conditions + self-tests left alone. - errorlog.md: broaden header to cover skills + harness + corrections; seed this session's corrections (INKY, Mail.Send token-audience, omnibox-strictness) and friction (git-bash /tmp, env-persistence, argv-limit, PowerShell var-case). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 11:39:43 -07:00
parent 927a06a0cf
commit 9960da5f9a
29 changed files with 388 additions and 36 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -43,7 +43,9 @@ production, data-loss. Detail: EXTENDED + `.claude/OLLAMA.md`.
 - **SSH:** system OpenSSH (`C:\Windows\System32\OpenSSH\ssh.exe`), never Git-for-Windows SSH.
 - **Data integrity:** never placeholder/fake data — check vault, wiki, or ask.
 - **Hard-to-reverse or outward-facing actions:** confirm first (per-action, per-session).
- **Error logging:** when a task errors during execution (failed command, skill, or tool call), append a brief entry to `errorlog.md` (repo root) — date, machine (from `identity.json`), the command/skill that failed, and the error. One line or two; just enough to spot patterns. These feed skill + harness improvements. Don't log expected/handled conditions, only genuine failures.
+- **Error logging (mandatory, all skills):** when a task or skill hits a GENUINE functional error during execution (failed command, API/auth failure, unexpected API response, tool call), record it to `errorlog.md` (repo root) via the canonical helper — never hand-format: `bash .claude/scripts/log-skill-error.sh "<skill/command>" "<brief error>" [--context "k=v ..."]`. It stamps UTC date + machine (from `identity.json`) and inserts in the standard `YYYY-MM-DD | MACHINE | skill | error` format (newest on top) for later skill **linting**, and soft-fails so it never breaks the caller. **Every skill MUST call it at its failure branches**; you (main loop) call it after any skill/command genuinely fails. Do NOT log expected/handled conditions (no search matches, no unread messages, a user declining a prompt) — only real failures worth spotting a pattern. Python skills shell out to the same helper.
+- **Log user corrections too (`--correction`):** when the user CORRECTS an improper assumption or wrong approach you took (e.g. "don't use INKY unless onboarded", "EXO already has Mail.Send", "I don't need an exact match"), log it: `bash .claude/scripts/log-skill-error.sh "<skill/context>" "assumed X; correct is Y" --correction`. These are the highest-value entries — they surface recurring bad assumptions so we can train them out of the system. If the correction is a durable preference, ALSO save it as a `feedback` memory (the two are complementary: memory fixes future behavior, errorlog tracks the pattern for linting).
+- **Log preventable friction too (`--friction`):** any time you waste tokens on a preventable, repeatable self-inflicted error — harness/env/tool misuse (Git-Bash `/tmp` path mismatch, shell env not persisting between Bash calls, passing huge args on the command line, PowerShell var case-collisions, etc.) — log it: `bash .claude/scripts/log-skill-error.sh "<context e.g. bash/env>" "what wasted tokens + the fix" --friction [--context "ref=<memory-or-rule>"]`. **If it repeats something already in memory or CLAUDE.md, that's the highest-value entry** — it means a rule/memory isn't working; cite the ref. This log is the corpus we lint to build better CLAUDE.md rules and to clean stale/misleading memory. Goal: stop paying twice for the same mistake.
 - **Windows:** ensure `bash` resolves to Git-for-Windows MSYS bash, not the WSL stub; write
  `.claude/current-mode` with a relative/forward-slash path only (never a backslash Windows
  path). Detail + fixes: EXTENDED.