harness: fleet-wide functional-error + correction + friction logging

Add .claude/scripts/log-skill-error.sh — the canonical agent error log helper (writes errorlog.md in DATE | MACHINE | skill | [type] error format, soft-fails). Three categories: execution failures (default), user corrections (--correction), and preventable self-inflicted friction (--friction; cite ref= when it repeats a documented gotcha). Goal: stop paying tokens twice for the same avoidable mistake. - CLAUDE.md: make logging mandatory for all skills + corrections + friction. - skill-creator: new skills must wire in the helper (guidance + checklist). - Retrofit every skill script's genuine failure branches to call the helper (b2/bitdefender/mailprotector/packetdial/coord python CLIs; remediation-tool + onboard365 bash; vault, rmm-auth, post-bot-alert, agy, grok, 1password, run-onboarding-diagnostic). Handled conditions + self-tests left alone. - errorlog.md: broaden header to cover skills + harness + corrections; seed this session's corrections (INKY, Mail.Send token-audience, omnibox-strictness) and friction (git-bash /tmp, env-persistence, argv-limit, PowerShell var-case). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 11:39:43 -07:00
parent 927a06a0cf
commit 9960da5f9a
29 changed files with 388 additions and 36 deletions
--- a/errorlog.md
+++ b/errorlog.md
@@ -1,14 +1,37 @@
 # Error Log

-Brief records of task-execution errors across the fleet, used to improve skills and the
-command harness. Append newest entries at the top. Keep each entry to 1-2 lines.
+Brief records of preventable, pattern-worthy events across the fleet — used to improve
+skills, write better CLAUDE.md rules, and clean stale/misleading memory. The aim: never
+pay tokens twice for the same avoidable mistake. Append newest at the top; keep entries to
+1-2 lines. **Always write via the helper, never by hand:**
+`bash .claude/scripts/log-skill-error.sh "<skill/context>" "<brief>" [--correction|--friction] [--context "k=v"]`

-Format: `YYYY-MM-DD | MACHINE | command/skill | error (brief)`
+Format: `YYYY-MM-DD | MACHINE | command/skill/context | [type] error (brief) [ctx: ...]`
+
+Categories (the `[type]` tag): _(none)_ = skill/command execution failure ·
+`[correction]` = user corrected an improper assumption I made ·
+`[friction]` = preventable self-inflicted token-waste (harness/env/tool misuse; cite a
+`ref=` in ctx when it repeats a documented gotcha — that flags a rule/memory to strengthen).

 ---

 <!-- Append entries below this line -->

+2026-06-15 | GURU-5070 | powershell/var-case | [friction] PowerShell vars are case-INSENSITIVE: $gUid silently overwrote $guid (GPO id), Set-ADObject hit a bad DN and left GPT.ini/AD versionNumber inconsistent until fixed. Never rely on case to distinguish PS variables
+
+2026-06-15 | GURU-5070 | python/argv-limit | [friction] passed full /api/agents JSON (248 agents) as a python CLI arg -> 'Argument list too long' on Windows. Pipe large payloads via stdin, not argv
+
+2026-06-15 | GURU-5070 | bash/env-persist | [friction] re-derived RMM token every call after $TOKEN/$RMM vanished between Bash tool calls - shell env does NOT persist across calls; must re-eval auth (or chain) in the same command
+
+2026-06-15 | GURU-5070 | bash/tmp-path | [friction] wrote curl -o /tmp/x.json then jq read it back and failed (No such file) - Git-Bash vs Write/tool /tmp resolve differently. Pipe directly or use repo-relative paths. REPEAT of documented gotcha [ctx: ref=feedback_tmp_path_windows]
+
+2026-06-15 | GURU-5070 | DMARC / DNS | [correction] assumed ACG's own INKY rua convention (reports-sg.inkydmarc.com) applied to a client domain; only use the INKY rua if THAT client is onboarded to INKY - otherwise plain p=none or a real mailbox
+
+2026-06-15 | GURU-5070 | remediation-tool (sendMail) | [correction] assumed none of the consented apps could send mail and started granting Graph Mail.Send; the Exchange Operator app ALREADY had Graph Mail.Send - I was decoding the EXO-audience token, not a Graph-audience token. Mint a Graph token for the app before concluding a permission is missing
+
+2026-06-15 | GURU-5070 | rmm-search | [correction] assumed the CLI search must replicate the UI Omnibox scoreMatch exactly; user wants a FLEXIBLE forgiving multi-field search optimized for first-try correctness, not UI parity
+
+
 2026-06-15 | GURU-BEAST-ROG | /syncro (comment edit) | Syncro API does not expose a comment-edit or comment-delete endpoint — once posted, comments can only be modified via the GUI. Bot posted an internal resolution note with an unwanted "Performed by: ClaudeTools Discord Bot" line and could not remove it programmatically. Remediation needed: either suppress bot-attribution lines from internal notes by default, or add a GUI-edit step to the workflow when the note needs correction.

 2026-06-14 | GURU-5070 | mailbox skill (Graph token) | FABB app `fabb3421` (Claude-MSP-Access / "Cloud MSP Access") token request returned AADSTS700016 — app/SP no longer present in azcomputerguru.com tenant (deleted; gotchas.md already marked it deprecated). Blocks /mailbox + the M365 contacts task. Verified the remediation suite (live, ACG tenant) carries NO Mail.Send/Mail.ReadWrite/Contacts scopes (investigator has Mail.Read only) — so a straight repoint can't restore mailbox-send/contacts. Pending Mike decision: stand up a single-tenant ACG-internal mailbox app vs. add scopes to a suite tier. [2026-06-15] Docs hardened — gotchas.md now marks fabb3421 DELETED with the Mail/Contacts-scope blast radius + flags the 3 legacy "old app only" tenants (Valleywide/Dataforth/Cascades) as now having NO working remediation app (migration URGENT); mailbox.md carries a BLOCKED/AADSTS700016 banner. DECISION 2026-06-15 (Mike): Mail.Send goes into the suite (Exchange Operator tier) since its real use is IR victim-notification during mailbox takeovers; add Mail.Send to the exchange-op manifest + consent, repoint mailbox.md to exchange-op. Implementation not yet executed (production app change, needs go).