feat(harness): P1+P2+P3 harness optimization complete (VERSION 1.4.0)

Task 5  one-line registry descriptions on the 8 biggest skills (remediation-tool,
        gc-audit, packetdial, memory-dream, human-flow, self-check, impeccable,
        mailprotector); skill-description injection ~3320 -> ~2123 tokens (~36%),
        keyword triggers preserved, frontmatter valid.
Task 7  thinned /save + /sync bodies to point at sync.sh (single source) instead of
        re-documenting internals; Phase 0 save-vs-sync, cross-user notes, exit-75
        reporting kept verbatim; mechanical sync never depends on an LLM step.
Task 10 session-logs/YYYY-MM/ forward convention for new logs (scoped-grep recall,
        no monolithic index); existing flat logs untouched (grep covers both).
Bash    now-phoenix.sh helper (fixed UTC-7 epoch math; replaces unreliable
        TZ=America/Phoenix date that silently returns UTC on Git-Bash).

P0 (1.2.0) + Task 6 CLAUDE split + Task 9 delegation (1.3.0) already shipped.
Spec: specs/claudetools-harness-optimization/plan.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-08 08:10:58 -07:00
parent 6671a7a400
commit 68ad1dbd40
14 changed files with 1740 additions and 1710 deletions

View File

@@ -26,11 +26,20 @@ Claude writes all sections directly. Be concise, factual, technical. No filler p
### Location
New logs go in a **`YYYY-MM/` month folder** under the relevant `session-logs/` dir (keeps the
flat dir from growing unbounded; recall is scoped grep over the month folders — no monolithic
index). `mkdir -p` the month folder before writing.
| Work scope | Path |
|---|---|
| Single project | `projects/<project>/session-logs/YYYY-MM-DD-session.md` |
| Client | `clients/<slug>/session-logs/YYYY-MM-DD-session.md` |
| Multi-project / general | `session-logs/YYYY-MM-DD-session.md` |
| Single project | `projects/<project>/session-logs/YYYY-MM/YYYY-MM-DD-<user>-<topic>.md` |
| Client | `clients/<slug>/session-logs/YYYY-MM/YYYY-MM-DD-<user>-<topic>.md` |
| Multi-project / general | `session-logs/YYYY-MM/YYYY-MM-DD-<user>-<topic>.md` |
> Existing flat logs (`session-logs/*.md`) stay where they are — recall grep covers both `*/*.md`
> (month folders) and `*.md` (legacy flat), so no mass migration. The month folder is added
> *after* `session-logs/`, so wiki slug derivation (`<project>`/`<slug>` captured before
> `session-logs/`) is unaffected. Use `bash .claude/scripts/now-phoenix.sh --date` for the date.
### Filename + append behavior
@@ -97,9 +106,11 @@ not on every save.
bash .claude/scripts/sync.sh
```
`sync.sh` is **serialized by a per-machine lock** (`.git/claudetools-sync.lock`) so concurrent sessions or the scheduled-task sync cannot interleave commits/rebases; if another sync is mid-flight it waits up to ~120s, then **exits 75 (deferred)** rather than racing — the next sync catches up. On a 75, do NOT print a success summary; report "**sync deferred — another sync is running; your session log is written locally and will sync on the next run**". Otherwise it: reconciles this machine's `git config user.name/email` to `.claude/identity.json` (so commit authorship can't drift), stages all changes with `git add -A` (after purging garbled Windows path-as-filename cruft), auto-commits, fetch + rebase, push, then the same flow for the vault repo, then surfaces cross-user `## Note for <user>` blocks.
> Note: `git add -A` is still the catch-all sweep, so a save run will also pick up any *other* dirty files in the shared tree. The lock prevents two syncs from racing, and per-session-unique log filenames prevent log overwrites — but the bare-`add -A` capture means full per-session commit isolation is a later step (see the isolation plan: drop blind `add -A` in favour of explicit per-session staging). For now, avoid running `/save` from two sessions at the exact same moment.
Same driver as `/sync` — see that command for the full semantics. The two load-bearing
points for reporting: **exit 75 = deferred** (another sync is running; report "sync deferred
— your session log is written locally and will sync on the next run", NOT a success summary);
and `git add -A` is a catch-all sweep, so avoid running `/save` from two sessions at the exact
same moment (per-session-unique log filenames prevent log overwrites, the lock prevents racing).
After sync, emit a **Post-commit Summary**:

View File

@@ -39,18 +39,15 @@ The intent: a `/sync` that finds unsaved work should default toward `/save`. Aut
## What this does
Invokes `bash .claude/scripts/sync.sh`, which:
Run it — the script is the single source of truth for all git ops (both `/sync` and `/save` invoke it):
1. Detects local changes (including untracked-only files) via `git status --porcelain`; stages with `git add -A` and auto-commits with `sync: auto-sync from <hostname> at <timestamp>`
2. Fetches from origin, rebases local commits onto remote
3. Pushes to origin
4. Copies `.claude/commands/*.md``~/.claude/commands/` so the global Claude CLI commands stay current without a manual copy
5. Repeats steps 1-3 for the **vault** repo (path read from `.claude/identity.json` `vault_path` field)
6. Surfaces any `## Note for <user>` / `## Message for <user>` blocks from incoming session logs
```bash
bash .claude/scripts/sync.sh
```
The script is the single source of truth for git operations. Both `/sync` and `/save` invoke it.
It stages (`git add -A`, submodule gitlinks unstaged unless `--with-submodules`), auto-commits, fetch+rebase+push for this repo then the vault repo, deploys `.claude/commands/*.md` + skills to `~/.claude/`, and surfaces incoming `## Note for <user>` blocks. Full internals: `.claude/CLAUDE_EXTENDED.md` / the script header.
**Concurrency:** the run is serialized by a per-machine lock (`.git/claudetools-sync.lock`) so two syncs (e.g. interactive + the scheduled-task sync, or two Claude sessions) can't interleave staging/commit/rebase/push. If another sync is already running, this run waits up to ~120s then **exits 75 (EX_TEMPFAIL = deferred, not a failure)** report it as deferred, not synced; the next run catches up. Stale locks (owner process dead, or older than 10 min) are auto-reclaimed.
**Exit 75 = deferred, not a failure.** The run is serialized by a per-machine lock (`.git/claudetools-sync.lock`); if another sync is mid-flight it waits ~120s then exits 75. On a 75, report "sync deferred — another sync is running; it will catch up next run", NOT a success summary. Stale locks (dead owner, or >10 min) auto-reclaim.
---

View File

@@ -30,3 +30,19 @@ or old harness during a heterogeneous rollout. See
(full manual, on-demand). Saves ~3.7k tokens per CLAUDE.md injection; nothing lost.
- Task 9 (P2): delegation re-tuned in CORE — act directly by default; delegate only for
high-volume output, blast radius >3 files/layers, domain shift, or parallel work.
## 1.4.0 — 2026-06-08 (P1+P2+P3 complete)
- Task 5: one-line registry descriptions on the 8 biggest skills (remediation-tool, gc-audit,
packetdial, memory-dream, human-flow, self-check, impeccable, mailprotector). Skill-description
injection ~3320 -> ~2123 tokens (~36% cut); keyword triggers preserved; frontmatter valid.
- Task 7: thinned `/save` + `/sync` bodies — they point to `sync.sh` as the single source instead
of re-documenting its internals; load-bearing LLM-judgment parts (Phase 0 save-vs-sync, cross-user
note display, exit-75 reporting) kept verbatim. The mechanical sync never depends on an LLM step.
- Task 10 (P3): `session-logs/YYYY-MM/` adopted as a FORWARD convention for new logs (recall = scoped
grep over month folders, no monolithic index); existing flat logs untouched (grep covers both).
Recall order (wiki -> CONTEXT/log -> coord) already lives in CORE.
- Deterministic Bash fix: `now-phoenix.sh` helper added — fixed UTC-7 epoch math, replaces the
unreliable `TZ=America/Phoenix date` (silently returns UTC on Git-Bash). `--iso/--date/--datetime/
--fmt` formats. `post-bot-alert.sh` already uses `jq -nc --arg` (verified, no change needed).
- Deferred (unchanged): full Python port = separate spec; Task 8 shard command bodies; promote
guard to FATAL after a clean warn window; schedule memory-dream --apply-safe per-machine.

View File

@@ -1 +1 @@
1.3.0
1.4.0

View File

@@ -0,0 +1,50 @@
#!/usr/bin/env bash
# now-phoenix.sh — emit the current America/Phoenix timestamp, deterministically.
#
# WHY: `TZ=America/Phoenix date` is unreliable on Git-for-Windows bash (the MSYS
# tz database is often absent, so it silently returns UTC). Arizona does NOT
# observe DST — it is fixed UTC-7 (MST) year-round — so we compute Phoenix time
# as (UTC epoch - 7h) and format it. No tz database, no DST edge cases, identical
# result on Windows / macOS / Linux.
#
# Usage:
# bash now-phoenix.sh -> 2026-06-08 14:32 PT (default, human log line)
# bash now-phoenix.sh --iso -> 2026-06-08T14:32:07-07:00
# bash now-phoenix.sh --date -> 2026-06-08
# bash now-phoenix.sh --datetime -> 2026-06-08 14:32:07
# bash now-phoenix.sh --epoch -> 1749422327 (raw UTC epoch, for arithmetic)
# bash now-phoenix.sh --fmt '+%H:%M' -> 14:32 (custom strftime, applied to Phoenix time)
#
# All output is on stdout, no trailing prose. Soft, dependency-free (coreutils date only).
set -euo pipefail
OFFSET=$((7 * 3600)) # Phoenix is UTC-7, fixed
EPOCH_UTC="$(date -u +%s)"
EPOCH_PHX=$((EPOCH_UTC - OFFSET))
# Portable "format an epoch as if it were UTC" (so the wall-clock we print is Phoenix local).
fmt_epoch() {
local e="$1" f="$2"
if date -u -d "@${e}" "$f" >/dev/null 2>&1; then
date -u -d "@${e}" "$f" # GNU/Git-Bash
else
date -u -r "${e}" "$f" # BSD/macOS
fi
}
case "${1:-}" in
--iso) printf '%s-07:00\n' "$(fmt_epoch "$EPOCH_PHX" '+%Y-%m-%dT%H:%M:%S')" ;;
--date) fmt_epoch "$EPOCH_PHX" '+%Y-%m-%d' ;;
--datetime) fmt_epoch "$EPOCH_PHX" '+%Y-%m-%d %H:%M:%S' ;;
--epoch) printf '%s\n' "$EPOCH_UTC" ;;
--fmt) fmt_epoch "$EPOCH_PHX" "${2:?--fmt needs a strftime arg, e.g. --fmt '+%H:%M'}" ;;
''|--pt) printf '%s PT\n' "$(fmt_epoch "$EPOCH_PHX" '+%Y-%m-%d %H:%M')" ;;
-h|--help)
grep -E '^#( |$)' "$0" | sed 's/^# \{0,1\}//'
;;
*)
echo "[ERROR] now-phoenix: unknown arg '$1' (try --help)" >&2
exit 64
;;
esac

File diff suppressed because it is too large Load Diff

View File

@@ -1,192 +1,185 @@
---
name: human-flow
description: >
A UI/UX scanner that specializes in detecting interaction patterns unintuitive or inefficient for humans using a mouse and keyboard.
Expands on frontend-design and impeccable by focusing on real human workflow friction: motor control (Fitts's Law, target sizing, precision),
discoverability (affordances, hover vs always-visible), keyboard parity (full navigation and activation without mouse),
feedback loops (immediate state changes, error recovery), task efficiency (click/keystroke count, context switches),
and forgiving interaction models. It produces structured reports with code locations, "why this feels bad for a human" explanations,
and specific, actionable recommendations to make mouse+keyboard workflows smoother, faster, and more intuitive.
Use when reviewing or building any interactive UI, especially data-heavy tools, dashboards, lists, forms, and complex workflows.
user-invocable: true
argument-hint: "[scan|audit|report] [target path or component]"
---
# human-flow — Human Mouse + Keyboard Workflow Scanner
This skill is a specialized scanner for **human intuition and ergonomics** in pointer + keyboard interfaces. It goes beyond visual polish, code quality, and general UX heuristics (covered by `frontend-design` and `impeccable`) to focus on what actually feels *clunky, hidden, or frustrating* when a real person is driving with a mouse and keyboard.
**Core Philosophy**
- Humans have limited precision, attention, and patience.
- Mouse users hate tiny targets, hidden controls, and precision clicking.
- Keyboard users hate missing focus, no activation keys, and mouse-only gestures.
- Good workflow design makes the *anticipated next action* obvious and low-effort with either input method.
- The best interfaces feel "just right" — large enough targets, immediate feedback, discoverable without hunting, and consistent models.
It is **mandatory** to consider human-flow when any mouse or keyboard interaction is involved.
## When to Invoke
- Before or after implementing interactive features (buttons, tables, lists, modals, forms, drag, selection).
- When reviewing a dashboard, admin tool, or data-heavy UI (e.g. session lists, machine management).
- To audit an existing interface for workflow friction.
- As a complement to `impeccable critique` / `audit` and `frontend-design` validation.
Run via natural language ("human-flow scan the sessions table", "run human-flow audit on the dashboard components") or explicitly.
## Commands
| Command | Description |
|---------------------|-------------|
| `scan [target]` | AST-powered scan of files/directories for workflow friction. Produces a 0-10 Friction Index report. |
| `audit [target]` | Deeper pass: combines AST analysis, component review, and state-flow audit. |
| `elevate [target]` | **Polish & redesign pass.** Goes beyond friction to make a UI top-notch: information hierarchy, signature moment, action gravity, lonely states, density, rhythm, type, tokens, depth/finish, motion — and flags when a screen should be **redesigned, not patched**. Produces an Elevation Index + prioritized tiers (Quick Wins / Elevations / Redesign Candidates). Add `--redesign` to emphasize structural restructuring. See `references/polish-and-redesign.md`. |
| `fix [target]` | **DISABLED (advisory only for now).** Auto-apply is off — the AST code generator reprints whole files and produces noisy diffs. Use the scan/report output and have an agent apply the fixes surgically. Will be revisited with a surgical (string-splice) editor. |
| `fancy [target]` | **"Fancy as fuck" mode** — elegance pass with a calibrated Restraint-o-Meter. |
| `report [target]` | Generate a formatted markdown report with the Friction Index rubric. |
If no command, defaults to `scan` on the provided target.
## Friction Index (0-10)
The scan produces an objective score based on weighted deductions:
- **Motor (3.0)**: Target size, precision, Fitts's Law.
- **Cognitive (2.5)**: Discoverability, affordance, consistency.
- **Keyboard (2.5)**: Accessibility, focus flow, parity.
- **Feedback (2.0)**: Visual response, state transitions.
Score = 10 - Σ(IssueSeverity * DimensionWeight)
You can combine: e.g. run `scan` first for friction, then `fancy` for delight opportunities.
## Usage in Practice (for the Agent)
1. Resolve the target (file, component, page, or directory of .tsx/.jsx/.css/.ts).
2. Load the heuristics from `references/`.
3. Use code tools (read_file, grep, list_dir) + the scanner script if helpful to find candidate patterns.
4. For each finding:
- Cite exact file:line or component.
- Explain *why it is unintuitive for a human with mouse and/or keyboard*.
- Give a concrete "better for humans" recommendation (with example diff or pattern when useful).
5. Prioritize by impact on common workflows (high-frequency actions first).
6. End with an overall "Human Workflow Score" (0-10) and top 3-5 recommended changes.
Always separate **mouse friction** and **keyboard friction** in reports, then **combined workflow** issues.
## Heuristics (Core Categories)
The full detailed list with examples and detection guidance lives in `references/mouse-keyboard-heuristics.md`.
High-level categories the scanner always checks:
- **Target Size & Motor Precision** (Fitts's Law)
- **Discoverability & Affordance** (does it look clickable/tappable? do secondary actions hide?)
- **Hover vs Always-Visible** (actions that require mouse hover to even see options)
- **Keyboard Parity & Activation** (can you do everything the mouse can, with reasonable keys?)
- **Focus & Navigation Flow** (tab order, focus trapping, visible focus, escape hatches)
- **Feedback & State Transitions** (does the UI react immediately and clearly to every mouse/keyboard action?)
- **Selection & Multi-Action Models** (row click vs checkbox, drag vs buttons — are they consistent and forgiving?)
- **Workflow Efficiency** (number of steps, precision required, dead space, context loss for common tasks)
- **Error Prevention & Recovery** (destructive actions, undo, clear "I didn't mean that" paths)
- **Density vs Clarity** (too much crammed into small areas forcing careful mousing)
The scanner is **opinionated toward making the happy path for a human operator faster and less error-prone**, even if it means slightly more visual weight or always-visible controls.
## Scripts & Tooling
- `scripts/scan.mjs` — runnable Node scanner.
- `node scripts/scan.mjs --path <target>` → friction mode (default)
- `node scripts/scan.mjs --path <target> --fancy` → fancy mode (collects signals + prompts for the qualitative beauty pass)
- The agent is expected to supplement with semantic understanding (reading full components, understanding the user task flow in the app) and the rich references. The fancy pass is intentionally more qualitative than the friction scanner.
## Integration with Other Skills
- Run **after** `frontend-design` visual validation and **alongside** `impeccable critique/audit`.
- Use `stop-slop` thinking when generating any example fixes or new component code.
- Can feed findings into `impeccable polish` or `harden` passes.
## Output Format (Preferred)
```markdown
## Human-Flow Scan: <target>
**Overall Human Workflow Score:** 6.5/10 (Mouse: 7/10, Keyboard: 6/10)
### High Friction (P0)
- **File:** ...:123 — Hover-only row actions
Why unintuitive: Secondary actions (End, Control) are at 0.5 opacity until hover. A keyboard user or rushed mouse user scanning the list will miss or struggle to target them.
Human impact: Common task (ending a session) requires extra precision and discovery step.
Recommendation: ...
```
See `references/report-template.md` for the full structure.
## "Elevate" Mode (`elevate`) — Polish & Redesign
Where `scan` finds what *hurts*, `elevate` finds what's *missing to be excellent* — and
decides when a screen is beyond polishing and should be **restructured**. It exists because
the maintainer is not a designer: after an `elevate` pass, the UI should feel/look/act as if
a senior product designer + UI expert + UX team planned it.
It is primarily an **agent judgment pass** seeded by static signals — read the component,
understand the user's task, score each dimension 15, then prescribe the concrete better
version (a tweak, or a sketched redesign). The 12 heuristics, the scoring model, and the
output shape live in `references/polish-and-redesign.md`. In brief:
- **12 heuristics:** Hierarchy & Visual Anchors · Signature Moment · Action Gravity ·
Narrative Coherence · Lonely States (empty/error/loading/success) · Progressive
Disclosure & Density · Spacing Rhythm · Typographic Scale · Token Fidelity · Surface/
Depth/Finish · Intentional Motion · Redesign Triggers.
- **Elevation Index (010):** weighted score, with Hierarchy / Signature / Action Gravity /
Narrative weighted heaviest.
- **Redesign Urgency (05):** if ≥ 4, lead with a Structural Audit ("restructure, don't
polish") and a sketched alternative layout/component tree.
- **Prioritized, not dumped:** `Opportunity = ImpactWeight × (5 score)`; present the top
57 as **Quick Wins / Elevations / Redesign Candidates**, each citing file + signal +
exact replacement.
Recommended sequence: `scan` (kill friction) → `elevate` (reach top-notch / decide redesign)
`fancy` (calibrated delight on top).
## "Fancy as Fuck" Mode (`fancy`)
This is a deliberate second (or standalone) pass focused on **beauty, refinement, and elegant interaction**.
### Philosophy
- Not every interface needs (or should have) fancy elements. The first question is always: *"Does this beauty make the experience more useful?"*
- **Core principle**: Don't be pretty just to be pretty. In the course of being as useful as possible, do it with panache.
- "Useful decoration" is explicitly welcomed on surfaces where beauty can amplify comprehension, guidance, emotional reassurance, decision speed, or long-term connection to the product.
- On dense internal tools (operator consoles, admin dashboards): favor *restrained luxury* and precision. Think "high-end instrument." Only add panache where it clearly helps the operator.
- On other surfaces (onboarding, public tools, marketing moments, creative experiences): more room for expressive, delightful "useful decoration."
- Fancy must **serve the human workflow**. It should increase perceived quality, clarity, emotional satisfaction, or effectiveness without adding cognitive load, slowing tasks, or hurting performance/accessibility.
- "Fancy" includes (but is not limited to): transitions & easing, micro-interactions, hover/focus states, page/view transitions (View Transitions API), loading & skeleton experiences, selection/confirmation moments, empty/success states, scroll reveals, depth/layering, typography shifts, cursor feedback, and tasteful celebration moments.
### How the Fancy Pass Works
1. Assess appropriateness for the surface and user context.
2. Look for **opportunities** where a small amount of elegant motion or feedback would make interactions feel more premium and alive.
3. Critique existing attempts that are half-baked, janky, overused, or performance-negative.
4. Suggest specific, high-craft refinements and polish.
5. Always respect `prefers-reduced-motion` and provide graceful fallbacks.
See `references/fancy-as-fuck.md` for the full set of beauty/elegance heuristics, appropriateness guidelines, and examples.
### Recommended Invocation Pattern
```bash
# Friction first
human-flow scan the dashboard
# Then delight
human-flow fancy the dashboard
```
The output of a `fancy` pass should live in its own section of the report (or a dedicated delight report) and feed nicely into `impeccable polish` or `delight` work.
## Creating / Extending
- Add new heuristics to `references/mouse-keyboard-heuristics.md` (with detection hints and "better human workflow" examples).
- Add fancy/delights ideas to `references/fancy-as-fuck.md`.
- Add polish/redesign heuristics to `references/polish-and-redesign.md` (the `elevate` layer).
- Update the scanner script for new static patterns (fancy detection is intentionally more qualitative).
- The skill is designed to be extended — new categories of mouse/keyboard friction **and** opportunities for tasteful elegance are welcome.
**Remember:** The goal is not "perfect accessibility" in isolation or "pretty UI". It is **making the actual anticipated physical and cognitive workflow of a human with a mouse and a keyboard feel natural, fast, and low-friction** — and, when appropriate, doing so with panache and high craft. Beauty that serves usefulness is excellent. Beauty for its own sake is noise.
---
name: human-flow
description: "UI/UX scanner for mouse+keyboard interaction friction: Fitts's Law/target sizing, discoverability/affordances, keyboard parity, feedback loops, task efficiency, forgiving interactions. Produces reports with code locations + fixes. Use when reviewing/building interactive UI (dashboards, lists, forms, complex workflows)."
user-invocable: true
argument-hint: "[scan|audit|report] [target path or component]"
---
# human-flow — Human Mouse + Keyboard Workflow Scanner
This skill is a specialized scanner for **human intuition and ergonomics** in pointer + keyboard interfaces. It goes beyond visual polish, code quality, and general UX heuristics (covered by `frontend-design` and `impeccable`) to focus on what actually feels *clunky, hidden, or frustrating* when a real person is driving with a mouse and keyboard.
**Core Philosophy**
- Humans have limited precision, attention, and patience.
- Mouse users hate tiny targets, hidden controls, and precision clicking.
- Keyboard users hate missing focus, no activation keys, and mouse-only gestures.
- Good workflow design makes the *anticipated next action* obvious and low-effort with either input method.
- The best interfaces feel "just right" — large enough targets, immediate feedback, discoverable without hunting, and consistent models.
It is **mandatory** to consider human-flow when any mouse or keyboard interaction is involved.
## When to Invoke
- Before or after implementing interactive features (buttons, tables, lists, modals, forms, drag, selection).
- When reviewing a dashboard, admin tool, or data-heavy UI (e.g. session lists, machine management).
- To audit an existing interface for workflow friction.
- As a complement to `impeccable critique` / `audit` and `frontend-design` validation.
Run via natural language ("human-flow scan the sessions table", "run human-flow audit on the dashboard components") or explicitly.
## Commands
| Command | Description |
|---------------------|-------------|
| `scan [target]` | AST-powered scan of files/directories for workflow friction. Produces a 0-10 Friction Index report. |
| `audit [target]` | Deeper pass: combines AST analysis, component review, and state-flow audit. |
| `elevate [target]` | **Polish & redesign pass.** Goes beyond friction to make a UI top-notch: information hierarchy, signature moment, action gravity, lonely states, density, rhythm, type, tokens, depth/finish, motion — and flags when a screen should be **redesigned, not patched**. Produces an Elevation Index + prioritized tiers (Quick Wins / Elevations / Redesign Candidates). Add `--redesign` to emphasize structural restructuring. See `references/polish-and-redesign.md`. |
| `fix [target]` | **DISABLED (advisory only for now).** Auto-apply is off — the AST code generator reprints whole files and produces noisy diffs. Use the scan/report output and have an agent apply the fixes surgically. Will be revisited with a surgical (string-splice) editor. |
| `fancy [target]` | **"Fancy as fuck" mode** — elegance pass with a calibrated Restraint-o-Meter. |
| `report [target]` | Generate a formatted markdown report with the Friction Index rubric. |
If no command, defaults to `scan` on the provided target.
## Friction Index (0-10)
The scan produces an objective score based on weighted deductions:
- **Motor (3.0)**: Target size, precision, Fitts's Law.
- **Cognitive (2.5)**: Discoverability, affordance, consistency.
- **Keyboard (2.5)**: Accessibility, focus flow, parity.
- **Feedback (2.0)**: Visual response, state transitions.
Score = 10 - Σ(IssueSeverity * DimensionWeight)
You can combine: e.g. run `scan` first for friction, then `fancy` for delight opportunities.
## Usage in Practice (for the Agent)
1. Resolve the target (file, component, page, or directory of .tsx/.jsx/.css/.ts).
2. Load the heuristics from `references/`.
3. Use code tools (read_file, grep, list_dir) + the scanner script if helpful to find candidate patterns.
4. For each finding:
- Cite exact file:line or component.
- Explain *why it is unintuitive for a human with mouse and/or keyboard*.
- Give a concrete "better for humans" recommendation (with example diff or pattern when useful).
5. Prioritize by impact on common workflows (high-frequency actions first).
6. End with an overall "Human Workflow Score" (0-10) and top 3-5 recommended changes.
Always separate **mouse friction** and **keyboard friction** in reports, then **combined workflow** issues.
## Heuristics (Core Categories)
The full detailed list with examples and detection guidance lives in `references/mouse-keyboard-heuristics.md`.
High-level categories the scanner always checks:
- **Target Size & Motor Precision** (Fitts's Law)
- **Discoverability & Affordance** (does it look clickable/tappable? do secondary actions hide?)
- **Hover vs Always-Visible** (actions that require mouse hover to even see options)
- **Keyboard Parity & Activation** (can you do everything the mouse can, with reasonable keys?)
- **Focus & Navigation Flow** (tab order, focus trapping, visible focus, escape hatches)
- **Feedback & State Transitions** (does the UI react immediately and clearly to every mouse/keyboard action?)
- **Selection & Multi-Action Models** (row click vs checkbox, drag vs buttons — are they consistent and forgiving?)
- **Workflow Efficiency** (number of steps, precision required, dead space, context loss for common tasks)
- **Error Prevention & Recovery** (destructive actions, undo, clear "I didn't mean that" paths)
- **Density vs Clarity** (too much crammed into small areas forcing careful mousing)
The scanner is **opinionated toward making the happy path for a human operator faster and less error-prone**, even if it means slightly more visual weight or always-visible controls.
## Scripts & Tooling
- `scripts/scan.mjs` — runnable Node scanner.
- `node scripts/scan.mjs --path <target>` → friction mode (default)
- `node scripts/scan.mjs --path <target> --fancy` → fancy mode (collects signals + prompts for the qualitative beauty pass)
- The agent is expected to supplement with semantic understanding (reading full components, understanding the user task flow in the app) and the rich references. The fancy pass is intentionally more qualitative than the friction scanner.
## Integration with Other Skills
- Run **after** `frontend-design` visual validation and **alongside** `impeccable critique/audit`.
- Use `stop-slop` thinking when generating any example fixes or new component code.
- Can feed findings into `impeccable polish` or `harden` passes.
## Output Format (Preferred)
```markdown
## Human-Flow Scan: <target>
**Overall Human Workflow Score:** 6.5/10 (Mouse: 7/10, Keyboard: 6/10)
### High Friction (P0)
- **File:** ...:123 — Hover-only row actions
Why unintuitive: Secondary actions (End, Control) are at 0.5 opacity until hover. A keyboard user or rushed mouse user scanning the list will miss or struggle to target them.
Human impact: Common task (ending a session) requires extra precision and discovery step.
Recommendation: ...
```
See `references/report-template.md` for the full structure.
## "Elevate" Mode (`elevate`) — Polish & Redesign
Where `scan` finds what *hurts*, `elevate` finds what's *missing to be excellent* — and
decides when a screen is beyond polishing and should be **restructured**. It exists because
the maintainer is not a designer: after an `elevate` pass, the UI should feel/look/act as if
a senior product designer + UI expert + UX team planned it.
It is primarily an **agent judgment pass** seeded by static signals — read the component,
understand the user's task, score each dimension 15, then prescribe the concrete better
version (a tweak, or a sketched redesign). The 12 heuristics, the scoring model, and the
output shape live in `references/polish-and-redesign.md`. In brief:
- **12 heuristics:** Hierarchy & Visual Anchors · Signature Moment · Action Gravity ·
Narrative Coherence · Lonely States (empty/error/loading/success) · Progressive
Disclosure & Density · Spacing Rhythm · Typographic Scale · Token Fidelity · Surface/
Depth/Finish · Intentional Motion · Redesign Triggers.
- **Elevation Index (010):** weighted score, with Hierarchy / Signature / Action Gravity /
Narrative weighted heaviest.
- **Redesign Urgency (05):** if ≥ 4, lead with a Structural Audit ("restructure, don't
polish") and a sketched alternative layout/component tree.
- **Prioritized, not dumped:** `Opportunity = ImpactWeight × (5 score)`; present the top
57 as **Quick Wins / Elevations / Redesign Candidates**, each citing file + signal +
exact replacement.
Recommended sequence: `scan` (kill friction) → `elevate` (reach top-notch / decide redesign)
`fancy` (calibrated delight on top).
## "Fancy as Fuck" Mode (`fancy`)
This is a deliberate second (or standalone) pass focused on **beauty, refinement, and elegant interaction**.
### Philosophy
- Not every interface needs (or should have) fancy elements. The first question is always: *"Does this beauty make the experience more useful?"*
- **Core principle**: Don't be pretty just to be pretty. In the course of being as useful as possible, do it with panache.
- "Useful decoration" is explicitly welcomed on surfaces where beauty can amplify comprehension, guidance, emotional reassurance, decision speed, or long-term connection to the product.
- On dense internal tools (operator consoles, admin dashboards): favor *restrained luxury* and precision. Think "high-end instrument." Only add panache where it clearly helps the operator.
- On other surfaces (onboarding, public tools, marketing moments, creative experiences): more room for expressive, delightful "useful decoration."
- Fancy must **serve the human workflow**. It should increase perceived quality, clarity, emotional satisfaction, or effectiveness without adding cognitive load, slowing tasks, or hurting performance/accessibility.
- "Fancy" includes (but is not limited to): transitions & easing, micro-interactions, hover/focus states, page/view transitions (View Transitions API), loading & skeleton experiences, selection/confirmation moments, empty/success states, scroll reveals, depth/layering, typography shifts, cursor feedback, and tasteful celebration moments.
### How the Fancy Pass Works
1. Assess appropriateness for the surface and user context.
2. Look for **opportunities** where a small amount of elegant motion or feedback would make interactions feel more premium and alive.
3. Critique existing attempts that are half-baked, janky, overused, or performance-negative.
4. Suggest specific, high-craft refinements and polish.
5. Always respect `prefers-reduced-motion` and provide graceful fallbacks.
See `references/fancy-as-fuck.md` for the full set of beauty/elegance heuristics, appropriateness guidelines, and examples.
### Recommended Invocation Pattern
```bash
# Friction first
human-flow scan the dashboard
# Then delight
human-flow fancy the dashboard
```
The output of a `fancy` pass should live in its own section of the report (or a dedicated delight report) and feed nicely into `impeccable polish` or `delight` work.
## Creating / Extending
- Add new heuristics to `references/mouse-keyboard-heuristics.md` (with detection hints and "better human workflow" examples).
- Add fancy/delights ideas to `references/fancy-as-fuck.md`.
- Add polish/redesign heuristics to `references/polish-and-redesign.md` (the `elevate` layer).
- Update the scanner script for new static patterns (fancy detection is intentionally more qualitative).
- The skill is designed to be extended — new categories of mouse/keyboard friction **and** opportunities for tasteful elegance are welcome.
**Remember:** The goal is not "perfect accessibility" in isolation or "pretty UI". It is **making the actual anticipated physical and cognitive workflow of a human with a mouse and a keyboard feel natural, fast, and low-friction** — and, when appropriate, doing so with panache and high craft. Beauty that serves usefulness is excellent. Beauty for its own sake is noise.

View File

@@ -1,168 +1,168 @@
---
name: impeccable
description: "Use when the user wants to design, redesign, shape, critique, audit, polish, clarify, distill, harden, optimize, adapt, animate, colorize, extract, or otherwise improve a frontend interface. Covers websites, landing pages, dashboards, product UI, app shells, components, forms, settings, onboarding, and empty states. Handles UX review, visual hierarchy, information architecture, cognitive load, accessibility, performance, responsive behavior, theming, anti-patterns, typography, fonts, spacing, layout, alignment, color, motion, micro-interactions, UX copy, error states, edge cases, i18n, and reusable design systems or tokens. Also use for bland designs that need to become bolder or more delightful, loud designs that should become quieter, live browser iteration on UI elements, or ambitious visual effects that should feel technically extraordinary. Not for backend-only or non-UI tasks."
argument-hint: "[{{command_hint}}] [target]"
user-invocable: true
allowed-tools:
- Bash(npx impeccable *)
license: Apache 2.0. Based on Anthropic's frontend-design skill. See NOTICE.md for attribution.
---
Designs and iterates production-grade frontend interfaces. Real working code, committed design choices, exceptional craft.
## Setup
Before any design work or file edits:
1. Load context (PRODUCT.md / DESIGN.md) via the loader script.
2. Identify the register and load the matching register reference (brand.md or product.md).
3. **If the user invoked a sub-command (e.g. `craft`, `shape`, `audit`), load its reference file too.** This is non-negotiable: `craft` without `craft.md` loaded means you'll skip the shape-and-confirm step the user expects.
Skipping these produces generic output that ignores the project.
### 1. Context gathering
Two files, case-insensitive. The loader looks at the project root by default and falls back to `.agents/context/` and `docs/` if the root is clean. Override with `IMPECCABLE_CONTEXT_DIR=path/to/dir` (absolute or relative to cwd).
- **PRODUCT.md**: required. Users, brand, tone, anti-references, strategic principles.
- **DESIGN.md**: optional, strongly recommended. Colors, typography, elevation, components.
Load both in one call:
```bash
node {{scripts_path}}/load-context.mjs
```
Consume the full JSON output. Never pipe through `head`, `tail`, `grep`, or `jq`. The output's `contextDir` field tells you where the files were resolved from.
If the output is already in this session's conversation history, don't re-run. Exceptions requiring a fresh load: you just ran `{{command_prefix}}impeccable teach` or `{{command_prefix}}impeccable document` (they rewrite the files), or the user manually edited one.
`{{command_prefix}}impeccable live` already warms context via `live.mjs`. If you've run `live.mjs`, don't also run `load-context.mjs` this session.
If PRODUCT.md is missing, empty, or placeholder (`[TODO]` markers, <200 chars): run `{{command_prefix}}impeccable teach`, then resume the user's original task with the fresh context. If the original task was `{{command_prefix}}impeccable craft`, resume into `{{command_prefix}}impeccable shape` before any implementation work.
If DESIGN.md is missing: nudge once per session (*"Run `{{command_prefix}}impeccable document` for more on-brand output"*), then proceed.
### 2. Register
Every design task is **brand** (marketing, landing, campaign, long-form content, portfolio: design IS the product) or **product** (app UI, admin, dashboard, tool: design SERVES the product).
Identify before designing. Priority: (1) cue in the task itself ("landing page" vs "dashboard"); (2) the surface in focus (the page, file, or route being worked on); (3) `register` field in PRODUCT.md. First match wins.
If PRODUCT.md lacks the `register` field (legacy), infer it once from its "Users" and "Product Purpose" sections, then cache the inferred value for the session. Suggest the user run `{{command_prefix}}impeccable teach` to add the field explicitly.
Load the matching reference: [reference/brand.md](reference/brand.md) or [reference/product.md](reference/product.md). The shared design laws below apply to both.
## Shared design laws
Apply to every design, both registers. Match implementation complexity to the aesthetic vision: maximalism needs elaborate code, minimalism needs precision. Interpret creatively. Vary across projects; never converge on the same choices. {{model}} is capable of extraordinary work. Don't hold back.
### Color
- Use OKLCH. Reduce chroma as lightness approaches 0 or 100; high chroma at extremes looks garish.
- Never use `#000` or `#fff`. Tint every neutral toward the brand hue (chroma 0.0050.01 is enough).
- Pick a **color strategy** before picking colors. Four steps on the commitment axis:
- **Restrained**: tinted neutrals + one accent ≤10%. Product default; brand minimalism.
- **Committed**: one saturated color carries 3060% of the surface. Brand default for identity-driven pages.
- **Full palette**: 34 named roles, each used deliberately. Brand campaigns; product data viz.
- **Drenched**: the surface IS the color. Brand heroes, campaign pages.
- The "one accent ≤10%" rule is Restrained only. Committed / Full palette / Drenched exceed it on purpose. Don't collapse every design to Restrained by reflex.
### Theme
Dark vs. light is never a default. Not dark "because tools look cool dark." Not light "to be safe."
Before choosing, write one sentence of physical scene: who uses this, where, under what ambient light, in what mood. If the sentence doesn't force the answer, it's not concrete enough. Add detail until it does.
"Observability dashboard" does not force an answer. "SRE glancing at incident severity on a 27-inch monitor at 2am in a dim room" does. Run the sentence, not the category.
### Typography
- Cap body line length at 6575ch.
- Hierarchy through scale + weight contrast (≥1.25 ratio between steps). Avoid flat scales.
### Layout
- Vary spacing for rhythm. Same padding everywhere is monotony.
- Cards are the lazy answer. Use them only when they're truly the best affordance. Nested cards are always wrong.
- Don't wrap everything in a container. Most things don't need one.
### Motion
- Don't animate CSS layout properties.
- Ease out with exponential curves (ease-out-quart / quint / expo). No bounce, no elastic.
### Absolute bans
Match-and-refuse. If you're about to write any of these, rewrite the element with different structure.
- **Side-stripe borders.** `border-left` or `border-right` greater than 1px as a colored accent on cards, list items, callouts, or alerts. Never intentional. Rewrite with full borders, background tints, leading numbers/icons, or nothing.
- **Gradient text.** `background-clip: text` combined with a gradient background. Decorative, never meaningful. Use a single solid color. Emphasis via weight or size.
- **Glassmorphism as default.** Blurs and glass cards used decoratively. Rare and purposeful, or nothing.
- **The hero-metric template.** Big number, small label, supporting stats, gradient accent. SaaS cliché.
- **Identical card grids.** Same-sized cards with icon + heading + text, repeated endlessly.
- **Modal as first thought.** Modals are usually laziness. Exhaust inline / progressive alternatives first.
### Copy
- Every word earns its place. No restated headings, no intros that repeat the title.
- **No em dashes.** Use commas, colons, semicolons, periods, or parentheses. Also not `--`.
### The AI slop test
If someone could look at this interface and say "AI made that" without doubt, it's failed. Cross-register failures are the absolute bans above. Register-specific failures live in each reference.
**Category-reflex check.** Run at two altitudes; the second one catches what the first one misses.
- **First-order:** if someone could guess the theme + palette from the category alone ("observability → dark blue", "healthcare → white + teal", "finance → navy + gold", "crypto → neon on black"), it's the first training-data reflex. Rework the scene sentence and color strategy until the answer isn't obvious from the domain.
- **Second-order:** if someone could guess the aesthetic family from category-plus-anti-references ("AI workflow tool that's not SaaS-cream → editorial-typographic", "fintech that's not navy-and-gold → terminal-native dark mode"), it's the trap one tier deeper. The first reflex was avoided; the second wasn't. Rework until both answers are not obvious. The brand register's [reflex-reject aesthetic lanes](reference/brand.md) list catches the currently-saturated families.
## Commands
| Command | Category | Description | Reference |
|---|---|---|---|
| `craft [feature]` | Build | Shape, then build a feature end-to-end | [reference/craft.md](reference/craft.md) |
| `shape [feature]` | Build | Plan UX/UI before writing code | [reference/shape.md](reference/shape.md) |
| `teach` | Build | Set up PRODUCT.md and DESIGN.md context | [reference/teach.md](reference/teach.md) |
| `document` | Build | Generate DESIGN.md from existing project code | [reference/document.md](reference/document.md) |
| `extract [target]` | Build | Pull reusable tokens and components into design system | [reference/extract.md](reference/extract.md) |
| `critique [target]` | Evaluate | UX design review with heuristic scoring | [reference/critique.md](reference/critique.md) |
| `audit [target]` | Evaluate | Technical quality checks (a11y, perf, responsive) | [reference/audit.md](reference/audit.md) |
| `polish [target]` | Refine | Final quality pass before shipping | [reference/polish.md](reference/polish.md) |
| `bolder [target]` | Refine | Amplify safe or bland designs | [reference/bolder.md](reference/bolder.md) |
| `quieter [target]` | Refine | Tone down aggressive or overstimulating designs | [reference/quieter.md](reference/quieter.md) |
| `distill [target]` | Refine | Strip to essence, remove complexity | [reference/distill.md](reference/distill.md) |
| `harden [target]` | Refine | Production-ready: errors, i18n, edge cases | [reference/harden.md](reference/harden.md) |
| `onboard [target]` | Refine | Design first-run flows, empty states, activation | [reference/onboard.md](reference/onboard.md) |
| `animate [target]` | Enhance | Add purposeful animations and motion | [reference/animate.md](reference/animate.md) |
| `colorize [target]` | Enhance | Add strategic color to monochromatic UIs | [reference/colorize.md](reference/colorize.md) |
| `typeset [target]` | Enhance | Improve typography hierarchy and fonts | [reference/typeset.md](reference/typeset.md) |
| `layout [target]` | Enhance | Fix spacing, rhythm, and visual hierarchy | [reference/layout.md](reference/layout.md) |
| `delight [target]` | Enhance | Add personality and memorable touches | [reference/delight.md](reference/delight.md) |
| `overdrive [target]` | Enhance | Push past conventional limits | [reference/overdrive.md](reference/overdrive.md) |
| `clarify [target]` | Fix | Improve UX copy, labels, and error messages | [reference/clarify.md](reference/clarify.md) |
| `adapt [target]` | Fix | Adapt for different devices and screen sizes | [reference/adapt.md](reference/adapt.md) |
| `optimize [target]` | Fix | Diagnose and fix UI performance | [reference/optimize.md](reference/optimize.md) |
| `live` | Iterate | Visual variant mode: pick elements in the browser, generate alternatives | [reference/live.md](reference/live.md) |
Plus two management commands: `pin <command>` and `unpin <command>`, detailed below.
### Routing rules
1. **No argument**: render the table above as the user-facing command menu, grouped by category. Ask what they'd like to do.
2. **First word matches a command**: load its reference file and follow its instructions. Everything after the command name is the target.
3. **First word doesn't match**: general design invocation. Apply the setup steps, shared design laws, and the loaded register reference, using the full argument as context.
Setup (context gathering, register) is already loaded by then; sub-commands don't re-invoke `{{command_prefix}}impeccable`.
If the first word is `craft`, setup still runs first, but [reference/craft.md](reference/craft.md) owns the rest of the flow. If setup invokes `teach` as a blocker, finish teach, refresh context, then resume the original command and target.
## Pin / Unpin
**Pin** creates a standalone shortcut so `{{command_prefix}}<command>` invokes `{{command_prefix}}impeccable <command>` directly. **Unpin** removes it. The script writes to every harness directory present in the project.
```bash
node {{scripts_path}}/pin.mjs <pin|unpin> <command>
```
Valid `<command>` is any command from the table above. Report the script's result concisely. Confirm the new shortcut on success, relay stderr verbatim on error.
---
name: impeccable
description: "Design, redesign, critique, audit, or polish a frontend interface (sites, landing pages, dashboards, app UI, components, forms, onboarding, empty states). Covers UX review, visual hierarchy, IA, accessibility, performance, responsive, theming, typography, spacing, color, motion, copy, design systems/tokens. Not for backend/non-UI."
argument-hint: "[{{command_hint}}] [target]"
user-invocable: true
allowed-tools:
- Bash(npx impeccable *)
license: Apache 2.0. Based on Anthropic's frontend-design skill. See NOTICE.md for attribution.
---
Designs and iterates production-grade frontend interfaces. Real working code, committed design choices, exceptional craft.
## Setup
Before any design work or file edits:
1. Load context (PRODUCT.md / DESIGN.md) via the loader script.
2. Identify the register and load the matching register reference (brand.md or product.md).
3. **If the user invoked a sub-command (e.g. `craft`, `shape`, `audit`), load its reference file too.** This is non-negotiable: `craft` without `craft.md` loaded means you'll skip the shape-and-confirm step the user expects.
Skipping these produces generic output that ignores the project.
### 1. Context gathering
Two files, case-insensitive. The loader looks at the project root by default and falls back to `.agents/context/` and `docs/` if the root is clean. Override with `IMPECCABLE_CONTEXT_DIR=path/to/dir` (absolute or relative to cwd).
- **PRODUCT.md**: required. Users, brand, tone, anti-references, strategic principles.
- **DESIGN.md**: optional, strongly recommended. Colors, typography, elevation, components.
Load both in one call:
```bash
node {{scripts_path}}/load-context.mjs
```
Consume the full JSON output. Never pipe through `head`, `tail`, `grep`, or `jq`. The output's `contextDir` field tells you where the files were resolved from.
If the output is already in this session's conversation history, don't re-run. Exceptions requiring a fresh load: you just ran `{{command_prefix}}impeccable teach` or `{{command_prefix}}impeccable document` (they rewrite the files), or the user manually edited one.
`{{command_prefix}}impeccable live` already warms context via `live.mjs`. If you've run `live.mjs`, don't also run `load-context.mjs` this session.
If PRODUCT.md is missing, empty, or placeholder (`[TODO]` markers, <200 chars): run `{{command_prefix}}impeccable teach`, then resume the user's original task with the fresh context. If the original task was `{{command_prefix}}impeccable craft`, resume into `{{command_prefix}}impeccable shape` before any implementation work.
If DESIGN.md is missing: nudge once per session (*"Run `{{command_prefix}}impeccable document` for more on-brand output"*), then proceed.
### 2. Register
Every design task is **brand** (marketing, landing, campaign, long-form content, portfolio: design IS the product) or **product** (app UI, admin, dashboard, tool: design SERVES the product).
Identify before designing. Priority: (1) cue in the task itself ("landing page" vs "dashboard"); (2) the surface in focus (the page, file, or route being worked on); (3) `register` field in PRODUCT.md. First match wins.
If PRODUCT.md lacks the `register` field (legacy), infer it once from its "Users" and "Product Purpose" sections, then cache the inferred value for the session. Suggest the user run `{{command_prefix}}impeccable teach` to add the field explicitly.
Load the matching reference: [reference/brand.md](reference/brand.md) or [reference/product.md](reference/product.md). The shared design laws below apply to both.
## Shared design laws
Apply to every design, both registers. Match implementation complexity to the aesthetic vision: maximalism needs elaborate code, minimalism needs precision. Interpret creatively. Vary across projects; never converge on the same choices. {{model}} is capable of extraordinary work. Don't hold back.
### Color
- Use OKLCH. Reduce chroma as lightness approaches 0 or 100; high chroma at extremes looks garish.
- Never use `#000` or `#fff`. Tint every neutral toward the brand hue (chroma 0.0050.01 is enough).
- Pick a **color strategy** before picking colors. Four steps on the commitment axis:
- **Restrained**: tinted neutrals + one accent ≤10%. Product default; brand minimalism.
- **Committed**: one saturated color carries 3060% of the surface. Brand default for identity-driven pages.
- **Full palette**: 34 named roles, each used deliberately. Brand campaigns; product data viz.
- **Drenched**: the surface IS the color. Brand heroes, campaign pages.
- The "one accent ≤10%" rule is Restrained only. Committed / Full palette / Drenched exceed it on purpose. Don't collapse every design to Restrained by reflex.
### Theme
Dark vs. light is never a default. Not dark "because tools look cool dark." Not light "to be safe."
Before choosing, write one sentence of physical scene: who uses this, where, under what ambient light, in what mood. If the sentence doesn't force the answer, it's not concrete enough. Add detail until it does.
"Observability dashboard" does not force an answer. "SRE glancing at incident severity on a 27-inch monitor at 2am in a dim room" does. Run the sentence, not the category.
### Typography
- Cap body line length at 6575ch.
- Hierarchy through scale + weight contrast (≥1.25 ratio between steps). Avoid flat scales.
### Layout
- Vary spacing for rhythm. Same padding everywhere is monotony.
- Cards are the lazy answer. Use them only when they're truly the best affordance. Nested cards are always wrong.
- Don't wrap everything in a container. Most things don't need one.
### Motion
- Don't animate CSS layout properties.
- Ease out with exponential curves (ease-out-quart / quint / expo). No bounce, no elastic.
### Absolute bans
Match-and-refuse. If you're about to write any of these, rewrite the element with different structure.
- **Side-stripe borders.** `border-left` or `border-right` greater than 1px as a colored accent on cards, list items, callouts, or alerts. Never intentional. Rewrite with full borders, background tints, leading numbers/icons, or nothing.
- **Gradient text.** `background-clip: text` combined with a gradient background. Decorative, never meaningful. Use a single solid color. Emphasis via weight or size.
- **Glassmorphism as default.** Blurs and glass cards used decoratively. Rare and purposeful, or nothing.
- **The hero-metric template.** Big number, small label, supporting stats, gradient accent. SaaS cliché.
- **Identical card grids.** Same-sized cards with icon + heading + text, repeated endlessly.
- **Modal as first thought.** Modals are usually laziness. Exhaust inline / progressive alternatives first.
### Copy
- Every word earns its place. No restated headings, no intros that repeat the title.
- **No em dashes.** Use commas, colons, semicolons, periods, or parentheses. Also not `--`.
### The AI slop test
If someone could look at this interface and say "AI made that" without doubt, it's failed. Cross-register failures are the absolute bans above. Register-specific failures live in each reference.
**Category-reflex check.** Run at two altitudes; the second one catches what the first one misses.
- **First-order:** if someone could guess the theme + palette from the category alone ("observability → dark blue", "healthcare → white + teal", "finance → navy + gold", "crypto → neon on black"), it's the first training-data reflex. Rework the scene sentence and color strategy until the answer isn't obvious from the domain.
- **Second-order:** if someone could guess the aesthetic family from category-plus-anti-references ("AI workflow tool that's not SaaS-cream → editorial-typographic", "fintech that's not navy-and-gold → terminal-native dark mode"), it's the trap one tier deeper. The first reflex was avoided; the second wasn't. Rework until both answers are not obvious. The brand register's [reflex-reject aesthetic lanes](reference/brand.md) list catches the currently-saturated families.
## Commands
| Command | Category | Description | Reference |
|---|---|---|---|
| `craft [feature]` | Build | Shape, then build a feature end-to-end | [reference/craft.md](reference/craft.md) |
| `shape [feature]` | Build | Plan UX/UI before writing code | [reference/shape.md](reference/shape.md) |
| `teach` | Build | Set up PRODUCT.md and DESIGN.md context | [reference/teach.md](reference/teach.md) |
| `document` | Build | Generate DESIGN.md from existing project code | [reference/document.md](reference/document.md) |
| `extract [target]` | Build | Pull reusable tokens and components into design system | [reference/extract.md](reference/extract.md) |
| `critique [target]` | Evaluate | UX design review with heuristic scoring | [reference/critique.md](reference/critique.md) |
| `audit [target]` | Evaluate | Technical quality checks (a11y, perf, responsive) | [reference/audit.md](reference/audit.md) |
| `polish [target]` | Refine | Final quality pass before shipping | [reference/polish.md](reference/polish.md) |
| `bolder [target]` | Refine | Amplify safe or bland designs | [reference/bolder.md](reference/bolder.md) |
| `quieter [target]` | Refine | Tone down aggressive or overstimulating designs | [reference/quieter.md](reference/quieter.md) |
| `distill [target]` | Refine | Strip to essence, remove complexity | [reference/distill.md](reference/distill.md) |
| `harden [target]` | Refine | Production-ready: errors, i18n, edge cases | [reference/harden.md](reference/harden.md) |
| `onboard [target]` | Refine | Design first-run flows, empty states, activation | [reference/onboard.md](reference/onboard.md) |
| `animate [target]` | Enhance | Add purposeful animations and motion | [reference/animate.md](reference/animate.md) |
| `colorize [target]` | Enhance | Add strategic color to monochromatic UIs | [reference/colorize.md](reference/colorize.md) |
| `typeset [target]` | Enhance | Improve typography hierarchy and fonts | [reference/typeset.md](reference/typeset.md) |
| `layout [target]` | Enhance | Fix spacing, rhythm, and visual hierarchy | [reference/layout.md](reference/layout.md) |
| `delight [target]` | Enhance | Add personality and memorable touches | [reference/delight.md](reference/delight.md) |
| `overdrive [target]` | Enhance | Push past conventional limits | [reference/overdrive.md](reference/overdrive.md) |
| `clarify [target]` | Fix | Improve UX copy, labels, and error messages | [reference/clarify.md](reference/clarify.md) |
| `adapt [target]` | Fix | Adapt for different devices and screen sizes | [reference/adapt.md](reference/adapt.md) |
| `optimize [target]` | Fix | Diagnose and fix UI performance | [reference/optimize.md](reference/optimize.md) |
| `live` | Iterate | Visual variant mode: pick elements in the browser, generate alternatives | [reference/live.md](reference/live.md) |
Plus two management commands: `pin <command>` and `unpin <command>`, detailed below.
### Routing rules
1. **No argument**: render the table above as the user-facing command menu, grouped by category. Ask what they'd like to do.
2. **First word matches a command**: load its reference file and follow its instructions. Everything after the command name is the target.
3. **First word doesn't match**: general design invocation. Apply the setup steps, shared design laws, and the loaded register reference, using the full argument as context.
Setup (context gathering, register) is already loaded by then; sub-commands don't re-invoke `{{command_prefix}}impeccable`.
If the first word is `craft`, setup still runs first, but [reference/craft.md](reference/craft.md) owns the rest of the flow. If setup invokes `teach` as a blocker, finish teach, refresh context, then resume the original command and target.
## Pin / Unpin
**Pin** creates a standalone shortcut so `{{command_prefix}}<command>` invokes `{{command_prefix}}impeccable <command>` directly. **Unpin** removes it. The script writes to every harness directory present in the project.
```bash
node {{scripts_path}}/pin.mjs <pin|unpin> <command>
```
Valid `<command>` is any command from the table above. Report the script's result concisely. Confirm the new shortcut on success, relay stderr verbatim on error.

View File

@@ -1,166 +1,156 @@
---
name: mailprotector
description: >-
Manage the Arizona Computer Guru (ACG) Mailprotector CloudFilter email-security
gateway via the live CloudFilter REST API (emailservice.io). Search and release
held / quarantined mail (inbound and outbound), pull mail-flow logs to explain
why a message did or did not deliver, inspect entity configuration and
allow/block rules, find a user or alias by email, and manage allow/block rules.
Read-only by default; every release / rule-add / config-change is gated behind
--confirm. Invoke for: "mailprotector", "cloudfilter", "emailservice.io", "held
mail", "quarantined email", "release email", "outbound quarantine", "why didn't
my email arrive", "email security gateway", "INKY", "mail flow logs", "allow
block rule", "release spam". This skill talks to the LIVE production reseller
CloudFilter platform — treat releases conservatively.
---
# Mailprotector / CloudFilter Skill
Standalone CLI client for the **Mailprotector CloudFilter REST API**
(`emailservice.io`), the reseller email-security platform ACG layers on top of
client mail flow. Read-only by default; every write (release, rule add, config
change) is gated behind `--confirm`.
## The two-layer context (important)
ACG's email security sits in front of client mailboxes as two cooperating layers:
| Layer | What it does |
|---|---|
| **Mailprotector CloudFilter** | The delivery / filtering gateway. Inbound and outbound mail passes through it; spam, virus, and policy hits are **held / quarantined** here. Releasing a held message re-injects it for delivery. This is the API this skill drives. |
| **INKY** | Email annotation / phishing-banner layer. Adds the warning banners and protects against impersonation. Not part of this API surface. |
Both sit **layered on top of the client's own Exchange / M365 mail flow** — so a
"missing email" investigation usually means: was it held at CloudFilter (check
`messages` / `logs`), or did it pass CloudFilter and stall in Exchange?
## Connection
| Item | Value |
|---|---|
| Base URL | `https://emailservice.io/api/v1` (override `MAILPROTECTOR_API_BASE_URL`) |
| Auth | `Authorization: Bearer <api_key>` |
| Vault entry | `msp-tools/mailprotector.sops.yaml`, field `credentials.api_key` |
| Env override | `MAILPROTECTOR_API_KEY` |
Credential resolution order: `MAILPROTECTOR_API_KEY` env -> vault
`credentials.api_key`. The key is never hardcoded; a clear setup error is raised
if neither resolves.
### Scopes
Five entity types carry `logs` / `messages` / `configuration` /
`allow_block_rules` / `users` / `domains` sub-resources. Path form is
`/{scope}/{id}/...`:
```
resellers, customers, domains, user_groups, users
```
The CLI validates `scope` against this set.
## Running the CLI
This machine's Python launcher is `py` (per identity.json); `python` / `python3`
also work. Run from the scripts dir so the two modules resolve.
```bash
cd C:/claudetools/.claude/skills/mailprotector/scripts
py mp.py status # validate token (GET /domains, per_page=1)
py mp.py domains # list domains (global)
py mp.py domains --scope customers --id <id>
py mp.py domain <domain_id>
py mp.py customers <reseller_id>
py mp.py customer <customer_id>
py mp.py users <scope> <id>
py mp.py user <user_id>
py mp.py find-user user@client.com # locate a user / alias by email (a READ)
py mp.py config <scope> <id> # shows permissions.messages.allow_spam_release
py mp.py rules <scope> <id>
```
### Mail-flow logs and held mail (the common investigation)
Both accept the same filters: `--sender --recipient --subject --decision
--sort-field --sort-direction --page --page-size`.
```bash
# Why didn't this arrive? Look at the decision in the flow logs.
py mp.py logs domains <domain_id> --recipient ceo@client.com --decision quarantine_spam
# Held / quarantined mail search.
py mp.py messages domains <domain_id> --sender boss@vendor.com
```
`--decision` values: `default`, `deliver`, `quarantine_spam`,
`quarantine_virus`, `quarantine_policy`, `bounce`, `encrypt`, `delete`.
`--sort-field` values: `@timestamp` (default), `prime.direction`,
`prime.from_header_raw`, `prime.recipient`, `prime.subject`, `prime.decision`,
`prime.score`.
## Writes (gated)
Every mutating command prints a `[DRY RUN]` line and exits non-zero unless you
pass `--confirm`.
```bash
py mp.py release <message_id> --confirm
py mp.py release <message_id> --recipients alt@client.com --confirm
py mp.py release-many <scope> <id> --ids 111,222,333 --confirm
py mp.py release-many <scope> <id> --all --confirm
py mp.py add-rule <scope> <id> --value vendor.com --type allow --confirm
py mp.py enable-release <scope> <id> --confirm
```
## The `allow_spam_release` gotcha
Releasing a held **spam** message fails if the owning entity does not have
`permissions.messages.allow_spam_release = true`. Workflow:
1. `py mp.py config <scope> <id>` — check `allow_spam_release`.
2. If `false`: `py mp.py enable-release <scope> <id> --confirm`.
3. Re-run the `release` / `release-many`.
Virus and policy quarantines are governed separately — only spam release is
gated by this permission.
## Example workflow: find a client's held outbound mail from a sender and release it
```bash
# 1. Find the client's domain.
py mp.py domains --scope customers --id <customer_id>
# 2. Search held messages from the sender (outbound = sender is the client user).
py mp.py messages domains <domain_id> --sender user@client.com --decision quarantine_spam
# 3. If it's spam-held, make sure release is permitted on the domain.
py mp.py config domains <domain_id> # check allow_spam_release
py mp.py enable-release domains <domain_id> --confirm # only if needed
# 4. Release by message id (DRY RUN first — omit --confirm to preview).
py mp.py release <message_id> # [DRY RUN]
py mp.py release <message_id> --confirm # actually release
```
## Raw escape hatch
The named commands cover the common surface; for anything else, hit the path
directly. Non-GET methods still require `--confirm`.
```bash
py mp.py raw GET domains/<id>/logs
py mp.py raw POST messages/<id>/deliver --body '{"include_original_recipients":1}' --confirm
```
## Notes
- This is the **LIVE production reseller CloudFilter platform**. A release
re-delivers real mail to real recipients, and an allow rule can let real spam
or phishing through — confirm the target entity with a read command before any
write, and prefer releasing specific message ids over `--all`.
- Pagination: `page` (default 1) and `per_page` (default 25); reseller
`messages` caps `per_page` at 50. The `X-Pagination` response header carries
the page/total metadata.
- Full endpoint catalog, filter tables, and the global `field[op]=value`
operators live in `references/api.md`.
---
name: mailprotector
description: "Manage the ACG Mailprotector CloudFilter email-security gateway (emailservice.io). Search/release held/quarantined mail (in+outbound), pull mail-flow logs (why a message did/did not deliver), inspect + manage allow/block rules. Read-only default; releases/rule-changes gated --confirm. Triggers: mailprotector, cloudfilter, held/quarantined mail, release email, allow/block rule, INKY. Live production."
---
# Mailprotector / CloudFilter Skill
Standalone CLI client for the **Mailprotector CloudFilter REST API**
(`emailservice.io`), the reseller email-security platform ACG layers on top of
client mail flow. Read-only by default; every write (release, rule add, config
change) is gated behind `--confirm`.
## The two-layer context (important)
ACG's email security sits in front of client mailboxes as two cooperating layers:
| Layer | What it does |
|---|---|
| **Mailprotector CloudFilter** | The delivery / filtering gateway. Inbound and outbound mail passes through it; spam, virus, and policy hits are **held / quarantined** here. Releasing a held message re-injects it for delivery. This is the API this skill drives. |
| **INKY** | Email annotation / phishing-banner layer. Adds the warning banners and protects against impersonation. Not part of this API surface. |
Both sit **layered on top of the client's own Exchange / M365 mail flow** — so a
"missing email" investigation usually means: was it held at CloudFilter (check
`messages` / `logs`), or did it pass CloudFilter and stall in Exchange?
## Connection
| Item | Value |
|---|---|
| Base URL | `https://emailservice.io/api/v1` (override `MAILPROTECTOR_API_BASE_URL`) |
| Auth | `Authorization: Bearer <api_key>` |
| Vault entry | `msp-tools/mailprotector.sops.yaml`, field `credentials.api_key` |
| Env override | `MAILPROTECTOR_API_KEY` |
Credential resolution order: `MAILPROTECTOR_API_KEY` env -> vault
`credentials.api_key`. The key is never hardcoded; a clear setup error is raised
if neither resolves.
### Scopes
Five entity types carry `logs` / `messages` / `configuration` /
`allow_block_rules` / `users` / `domains` sub-resources. Path form is
`/{scope}/{id}/...`:
```
resellers, customers, domains, user_groups, users
```
The CLI validates `scope` against this set.
## Running the CLI
This machine's Python launcher is `py` (per identity.json); `python` / `python3`
also work. Run from the scripts dir so the two modules resolve.
```bash
cd C:/claudetools/.claude/skills/mailprotector/scripts
py mp.py status # validate token (GET /domains, per_page=1)
py mp.py domains # list domains (global)
py mp.py domains --scope customers --id <id>
py mp.py domain <domain_id>
py mp.py customers <reseller_id>
py mp.py customer <customer_id>
py mp.py users <scope> <id>
py mp.py user <user_id>
py mp.py find-user user@client.com # locate a user / alias by email (a READ)
py mp.py config <scope> <id> # shows permissions.messages.allow_spam_release
py mp.py rules <scope> <id>
```
### Mail-flow logs and held mail (the common investigation)
Both accept the same filters: `--sender --recipient --subject --decision
--sort-field --sort-direction --page --page-size`.
```bash
# Why didn't this arrive? Look at the decision in the flow logs.
py mp.py logs domains <domain_id> --recipient ceo@client.com --decision quarantine_spam
# Held / quarantined mail search.
py mp.py messages domains <domain_id> --sender boss@vendor.com
```
`--decision` values: `default`, `deliver`, `quarantine_spam`,
`quarantine_virus`, `quarantine_policy`, `bounce`, `encrypt`, `delete`.
`--sort-field` values: `@timestamp` (default), `prime.direction`,
`prime.from_header_raw`, `prime.recipient`, `prime.subject`, `prime.decision`,
`prime.score`.
## Writes (gated)
Every mutating command prints a `[DRY RUN]` line and exits non-zero unless you
pass `--confirm`.
```bash
py mp.py release <message_id> --confirm
py mp.py release <message_id> --recipients alt@client.com --confirm
py mp.py release-many <scope> <id> --ids 111,222,333 --confirm
py mp.py release-many <scope> <id> --all --confirm
py mp.py add-rule <scope> <id> --value vendor.com --type allow --confirm
py mp.py enable-release <scope> <id> --confirm
```
## The `allow_spam_release` gotcha
Releasing a held **spam** message fails if the owning entity does not have
`permissions.messages.allow_spam_release = true`. Workflow:
1. `py mp.py config <scope> <id>` — check `allow_spam_release`.
2. If `false`: `py mp.py enable-release <scope> <id> --confirm`.
3. Re-run the `release` / `release-many`.
Virus and policy quarantines are governed separately — only spam release is
gated by this permission.
## Example workflow: find a client's held outbound mail from a sender and release it
```bash
# 1. Find the client's domain.
py mp.py domains --scope customers --id <customer_id>
# 2. Search held messages from the sender (outbound = sender is the client user).
py mp.py messages domains <domain_id> --sender user@client.com --decision quarantine_spam
# 3. If it's spam-held, make sure release is permitted on the domain.
py mp.py config domains <domain_id> # check allow_spam_release
py mp.py enable-release domains <domain_id> --confirm # only if needed
# 4. Release by message id (DRY RUN first — omit --confirm to preview).
py mp.py release <message_id> # [DRY RUN]
py mp.py release <message_id> --confirm # actually release
```
## Raw escape hatch
The named commands cover the common surface; for anything else, hit the path
directly. Non-GET methods still require `--confirm`.
```bash
py mp.py raw GET domains/<id>/logs
py mp.py raw POST messages/<id>/deliver --body '{"include_original_recipients":1}' --confirm
```
## Notes
- This is the **LIVE production reseller CloudFilter platform**. A release
re-delivers real mail to real recipients, and an allow rule can let real spam
or phishing through — confirm the target entity with a read command before any
write, and prefer releasing specific message ids over `--all`.
- Pagination: `page` (default 1) and `per_page` (default 25); reseller
`messages` caps `per_page` at 50. The `X-Pagination` response header carries
the page/total metadata.
- Full endpoint catalog, filter tables, and the global `field[op]=value`
operators live in `references/api.md`.

View File

@@ -1,142 +1,130 @@
---
name: memory-dream
description: >-
Memory lint + consolidation analyzer for the ClaudeTools REPO memory store
(.claude/memory/). Audits the index, backlinks, referenced file paths,
duplicate/overlap clusters, stale dated facts, and drift against the
machine-local harness profile memory store. Default run is read-only.
--apply-safe performs the low-risk fixes (append missing index lines, copy
any profile-only files into the repo for indexing). Cluster merges, dedup
deletes, and stale-fact removal are surfaced as PROPOSED actions for a
human to apply -- they're judgment calls, not automation candidates. (Repo
is the source of truth as of 2026-06-02; sync-memory.sh mirrors repo to
profile, so PROFILE-side cleanup is handled by that script, not here. See
feedback_memory_sync_destructive_ok.md.) Invoke for: "memory dream",
"consolidate memory", "memory lint", "clean up memory", "memory errors",
"dedupe memory".
---
# Memory Dream
A read-only-by-default analyzer that flags issues in the shared memory store.
Mutating ops are gated behind `--apply-safe` (for low-risk fixes) or the
PROPOSED section (for judgment calls a human resolves by hand).
## The two-store model (important)
There are TWO separate memory stores on every machine:
- REPO store -- `.claude/memory/` (88+ `*.md` files + `MEMORY.md` index).
Tracked in git, syncs to all machines via Gitea. **This is the source of
truth.** `CLAUDE.md` mandates writing here.
- HARNESS PROFILE store -- `$HOME/.claude/projects/<slug>/memory/`. Machine
local, NOT in git, NOT synced. This is the store the Claude Code harness
auto-injects into the system prompt at session start.
The two drift over time. `memory-dream` reports that drift in its report
section. The companion script `.claude/scripts/sync-memory.sh` is what
actually reconciles them: it runs in **mirror mode** (since 2026-06-02) —
repo is authoritative, profile is synced to match (deletions propagate;
repo content wins on conflict). PROFILE-side hygiene lives in
`sync-memory.sh`, not here.
## What it checks
`scripts/memory_dream.py` runs six READ-ONLY analyses over the REPO store:
1. INDEX RECONCILE -- orphan files (no `MEMORY.md` line), index lines whose
target file is missing, and frontmatter `name:` vs filename signals.
2. BACKLINKS -- `[[name]]` references in bodies whose target slug has no file.
3. REFERENCED-ARTIFACT VALIDITY -- conservatively extracts repo-relative file
paths / script names from each body (backtick-wrapped single tokens only)
and flags ones not found in the repo. Reported as **verify**, never delete
(many are legitimately server-side or in sibling repos).
4. DUPLICATE / OVERLAP CLUSTERS -- groups memories by type + token-overlap /
shared slug-prefix and lists candidate mergeable clusters (e.g. the many
`feedback_syncro_*` files). **Proposes** merges; never performs them.
5. STALE DATED FACTS -- flags `project`-type memories with an "as of <date>"
style claim older than ~6 months for re-verification.
6. DRIFT vs PROFILE STORE -- locates the harness profile memory dir for this
project and reports profile-only files (candidates to migrate INTO the repo)
and repo-only files (candidates to push OUT to profile). Report only.
The report ends with a `## PROPOSED (needs human approval)` section that is
NEVER auto-applied.
## Modes
- Default (no flag) -- **REPORT ONLY. Mutates nothing.** Writes a timestamped
report to `.claude/memory/_reports/YYYY-MM-DD-HHMM-dream.md` (created if
missing) and prints it to stdout.
- `--apply-safe` -- performs ONLY additive, non-destructive fixes and prints
each action:
- (a) append missing index lines to `MEMORY.md` for orphan files, under the
correct `## <Type>` header, never reordering or removing existing lines;
- (b) copy profile-only memory files INTO the repo store (additive
migration). If a same-named repo file already exists it is SKIPPED and the
conflict is reported -- it is never overwritten.
- `--no-file` -- print to stdout only; skip writing the `_reports/` file.
- `--report-file <path>` -- write the report to an explicit path.
### What dream does NOT auto-do
`memory-dream` does NOT, even with `--apply-safe`:
- delete a repo memory file (cluster dedup is a judgment call — pick which file becomes canonical, fold the others' content, retire the originals deliberately);
- remove or reorder index lines (index cleanups are also surfaced as proposals);
- overwrite a file whose content differs;
- perform a proposed merge.
These stay in the report's `## PROPOSED` section. The rationale isn't "never delete" any more (the fleet-wide additive safety net was dropped 2026-06-02; see `feedback_memory_sync_destructive_ok.md`) — it's that merges and dedups require human judgment about which file is canonical and how to combine content. Profile-side deletion DOES happen automatically — but in `sync-memory.sh`, not here.
## Running it
This machine's Python launcher is `py` (per identity.json); the script also
runs under `python` / `python3`. Stdlib only -- no pip deps.
```bash
# REPORT ONLY (default) -- writes _reports/<stamp>-dream.md and prints it
py "$CLAUDETOOLS_ROOT/.claude/skills/memory-dream/scripts/memory_dream.py"
# report to stdout only, write nothing
py "$CLAUDETOOLS_ROOT/.claude/skills/memory-dream/scripts/memory_dream.py" --no-file
# additive-only fixes (append orphan index lines, migrate profile-only files)
py "$CLAUDETOOLS_ROOT/.claude/skills/memory-dream/scripts/memory_dream.py" --apply-safe
```
`CLAUDETOOLS_ROOT` resolves from the env var, else `claudetools_root` in
`.claude/identity.json`, else the repo root derived from the script's own
location -- no hardcoded drive letters.
## Cleanup / approve workflow
1. Run with no flag. Read the report (stdout or `_reports/<stamp>-dream.md`).
2. Run `--apply-safe` to take the safe additive wins: orphan index lines get
added, profile-only memories get migrated into the repo (conflicts skipped
and reported).
3. Work the `## PROPOSED` section by hand:
- `[MERGE?]` -- decide whether to consolidate a cluster. If yes, author a new
combined memory (or set of files for a rule/history split), retire the
originals via `git rm`, update `MEMORY.md`. Deletions are now first-class
`sync-memory.sh` mirror mode will propagate them to every profile store
on the next run.
- `[REVERIFY?]` -- confirm the dated fact still holds; update the body and
its date if it changed.
- `[STALE-REF?]` -- confirm the referenced path moved/renamed; repoint or
annotate. Many are legitimately server-side (`.service` units, `/opt/...`).
- `[INDEX-CLEANUP?]` / `[DRIFT-RESOLVE?]` -- human picks the winner.
4. Commit the repo store changes so they sync to the fleet via Gitea.
## Self-test
`scripts/selftest.py` runs the analyzer against a synthetic fixture memory
store in a temp dir and asserts each detector fires (orphan, missing target,
broken backlink, stale path, cluster, profile drift) and that `--apply-safe`
only touches the things it's supposed to (index appends + profile→repo copy
of new files; no deletions, no merges, no overwrites of differing content).
Run:
```bash
py "$CLAUDETOOLS_ROOT/.claude/skills/memory-dream/scripts/selftest.py"
```
---
name: memory-dream
description: "Lint + consolidate the ClaudeTools repo memory store (.claude/memory/): audits index, backlinks, file paths, duplicate clusters, stale facts. Read-only default; --apply-safe does low-risk fixes; merges/deletes surfaced as proposals. Triggers: memory dream, consolidate/lint/clean up/dedupe memory."
---
# Memory Dream
A read-only-by-default analyzer that flags issues in the shared memory store.
Mutating ops are gated behind `--apply-safe` (for low-risk fixes) or the
PROPOSED section (for judgment calls a human resolves by hand).
## The two-store model (important)
There are TWO separate memory stores on every machine:
- REPO store -- `.claude/memory/` (88+ `*.md` files + `MEMORY.md` index).
Tracked in git, syncs to all machines via Gitea. **This is the source of
truth.** `CLAUDE.md` mandates writing here.
- HARNESS PROFILE store -- `$HOME/.claude/projects/<slug>/memory/`. Machine
local, NOT in git, NOT synced. This is the store the Claude Code harness
auto-injects into the system prompt at session start.
The two drift over time. `memory-dream` reports that drift in its report
section. The companion script `.claude/scripts/sync-memory.sh` is what
actually reconciles them: it runs in **mirror mode** (since 2026-06-02) —
repo is authoritative, profile is synced to match (deletions propagate;
repo content wins on conflict). PROFILE-side hygiene lives in
`sync-memory.sh`, not here.
## What it checks
`scripts/memory_dream.py` runs six READ-ONLY analyses over the REPO store:
1. INDEX RECONCILE -- orphan files (no `MEMORY.md` line), index lines whose
target file is missing, and frontmatter `name:` vs filename signals.
2. BACKLINKS -- `[[name]]` references in bodies whose target slug has no file.
3. REFERENCED-ARTIFACT VALIDITY -- conservatively extracts repo-relative file
paths / script names from each body (backtick-wrapped single tokens only)
and flags ones not found in the repo. Reported as **verify**, never delete
(many are legitimately server-side or in sibling repos).
4. DUPLICATE / OVERLAP CLUSTERS -- groups memories by type + token-overlap /
shared slug-prefix and lists candidate mergeable clusters (e.g. the many
`feedback_syncro_*` files). **Proposes** merges; never performs them.
5. STALE DATED FACTS -- flags `project`-type memories with an "as of <date>"
style claim older than ~6 months for re-verification.
6. DRIFT vs PROFILE STORE -- locates the harness profile memory dir for this
project and reports profile-only files (candidates to migrate INTO the repo)
and repo-only files (candidates to push OUT to profile). Report only.
The report ends with a `## PROPOSED (needs human approval)` section that is
NEVER auto-applied.
## Modes
- Default (no flag) -- **REPORT ONLY. Mutates nothing.** Writes a timestamped
report to `.claude/memory/_reports/YYYY-MM-DD-HHMM-dream.md` (created if
missing) and prints it to stdout.
- `--apply-safe` -- performs ONLY additive, non-destructive fixes and prints
each action:
- (a) append missing index lines to `MEMORY.md` for orphan files, under the
correct `## <Type>` header, never reordering or removing existing lines;
- (b) copy profile-only memory files INTO the repo store (additive
migration). If a same-named repo file already exists it is SKIPPED and the
conflict is reported -- it is never overwritten.
- `--no-file` -- print to stdout only; skip writing the `_reports/` file.
- `--report-file <path>` -- write the report to an explicit path.
### What dream does NOT auto-do
`memory-dream` does NOT, even with `--apply-safe`:
- delete a repo memory file (cluster dedup is a judgment call — pick which file becomes canonical, fold the others' content, retire the originals deliberately);
- remove or reorder index lines (index cleanups are also surfaced as proposals);
- overwrite a file whose content differs;
- perform a proposed merge.
These stay in the report's `## PROPOSED` section. The rationale isn't "never delete" any more (the fleet-wide additive safety net was dropped 2026-06-02; see `feedback_memory_sync_destructive_ok.md`) — it's that merges and dedups require human judgment about which file is canonical and how to combine content. Profile-side deletion DOES happen automatically — but in `sync-memory.sh`, not here.
## Running it
This machine's Python launcher is `py` (per identity.json); the script also
runs under `python` / `python3`. Stdlib only -- no pip deps.
```bash
# REPORT ONLY (default) -- writes _reports/<stamp>-dream.md and prints it
py "$CLAUDETOOLS_ROOT/.claude/skills/memory-dream/scripts/memory_dream.py"
# report to stdout only, write nothing
py "$CLAUDETOOLS_ROOT/.claude/skills/memory-dream/scripts/memory_dream.py" --no-file
# additive-only fixes (append orphan index lines, migrate profile-only files)
py "$CLAUDETOOLS_ROOT/.claude/skills/memory-dream/scripts/memory_dream.py" --apply-safe
```
`CLAUDETOOLS_ROOT` resolves from the env var, else `claudetools_root` in
`.claude/identity.json`, else the repo root derived from the script's own
location -- no hardcoded drive letters.
## Cleanup / approve workflow
1. Run with no flag. Read the report (stdout or `_reports/<stamp>-dream.md`).
2. Run `--apply-safe` to take the safe additive wins: orphan index lines get
added, profile-only memories get migrated into the repo (conflicts skipped
and reported).
3. Work the `## PROPOSED` section by hand:
- `[MERGE?]` -- decide whether to consolidate a cluster. If yes, author a new
combined memory (or set of files for a rule/history split), retire the
originals via `git rm`, update `MEMORY.md`. Deletions are now first-class
`sync-memory.sh` mirror mode will propagate them to every profile store
on the next run.
- `[REVERIFY?]` -- confirm the dated fact still holds; update the body and
its date if it changed.
- `[STALE-REF?]` -- confirm the referenced path moved/renamed; repoint or
annotate. Many are legitimately server-side (`.service` units, `/opt/...`).
- `[INDEX-CLEANUP?]` / `[DRIFT-RESOLVE?]` -- human picks the winner.
4. Commit the repo store changes so they sync to the fleet via Gitea.
## Self-test
`scripts/selftest.py` runs the analyzer against a synthetic fixture memory
store in a temp dir and asserts each detector fires (orphan, missing target,
broken backlink, stale path, cluster, profile drift) and that `--apply-safe`
only touches the things it's supposed to (index appends + profile→repo copy
of new files; no deletions, no merges, no overwrites of differing content).
Run:
```bash
py "$CLAUDETOOLS_ROOT/.claude/skills/memory-dream/scripts/selftest.py"
```

View File

@@ -1,127 +1,115 @@
---
name: packetdial
description: >-
Manage the Arizona Computer Guru (ACG) PacketDial / OITVOIP hosted-VoIP
platform via the NetSapiens SNAPsolution API v2 (pbx.packetdial.com,
v44.4). List and inspect domains, users, devices/phones, DIDs (phone
numbers), resellers, and pull CDRs (call detail records). Provision new
customer domains, users, SIP devices, and phone numbers (all writes gated
behind --confirm). Read-only by default. Invoke for: "packetdial",
"oitvoip", "oit voip", "netsapiens", "voip portal", "pbx portal", "voip
domain", "voip user", "voip extension", "provision phone", "add did",
"phone number on voip", "call detail records", "cdr", "voip.packetdial",
"pbx.packetdial". NOTE: voip.packetdial.com is the customer-facing portal
(the fax/UC dashboard, e.g. Cascades account 28598) and has no API — the
programmable surface is pbx.packetdial.com. This skill talks to the LIVE
production reseller PBX; treat writes conservatively.
---
# PacketDial / NetSapiens (OITVOIP) Skill
Standalone CLI client for the NetSapiens SNAPsolution **API v2** that backs
ACG's hosted-VoIP offering through OITVOIP / PacketDial. Read-only by default;
every write (create / update / delete) is gated behind `--confirm`.
## The two hostnames (important)
| Host | What it is | API? |
|---|---|---|
| `voip.packetdial.com` | Customer-facing white-label portal / UC & fax dashboard (e.g. Cascades fax account **28598**). Login-gated UI. | **No** |
| `pbx.packetdial.com` | Reseller PBX platform — NetSapiens v44.4. | **Yes** — this skill targets it |
- API base: `https://pbx.packetdial.com/ns-api/v2`
- Token endpoint: `https://pbx.packetdial.com/ns-api/v2/tokens`
- Live OpenAPI spec: `https://pbx.packetdial.com/ns-api/webroot/openapi/openapi.json`
- Live Swagger UI: `https://pbx.packetdial.com/ns-api/openapi`
- Vendor docs: https://docs.ns-api.com/ (login) and https://voipdocs.io/oitvoip-access-platform-apis
## Credentials — ONE-TIME SETUP (not yet provisioned)
As of this skill's creation **no API key exists yet** — the vault entry
`msp-tools/oitvoip.sops.yaml` is empty/absent, so every command will fail with a
clear "No credentials found" error until you do this once:
1. Log into `pbx.packetdial.com` -> **Admin > API Keys** and create a
reseller-scoped key (prefix `nsr_`). If self-service key creation is not
available, reply to **Darwin Escaro (OITVOIP)** for reseller OAuth client
credentials.
2. Store it in the SOPS vault. Preferred (static bearer key):
```
# msp-tools/oitvoip.sops.yaml
credentials:
api_key: nsr_xxxxxxxxxxxxxxxx
```
Or, for OAuth2 password-grant credentials:
```
credentials:
client_id: <client id>
client_secret: <client secret>
username: <portal user@domain>
password: <portal password>
```
3. That's it — the client auto-detects which shape is present.
The client never hardcodes secrets. Resolution order: `PACKETDIAL_API_KEY` env
-> `PACKETDIAL_CLIENT_ID`+friends env -> vault `credentials.api_key` -> vault
OAuth fields. Env overrides exist for quick testing without touching the vault.
## Running the CLI
This machine's Python launcher is `py` (per identity.json); `python` / `python3`
also work. Run from the scripts dir so the two modules resolve.
```bash
cd C:/claudetools/.claude/skills/packetdial/scripts
py ns.py status # API version + authenticated key identity
py ns.py domains # list all domains
py ns.py domain <domain> # one domain's config
py ns.py users <domain> # users / extensions in a domain
py ns.py user <domain> <user>
py ns.py phones <domain> # SIP devices registered in a domain
py ns.py dids <domain> # phone numbers (DIDs) on a domain
py ns.py devices <domain> <user>
py ns.py cdrs --domain <domain> --start 2026-06-01 --end 2026-06-02
py ns.py resellers
```
## Writes (gated)
Every mutating command prints a `[DRY RUN]` line and exits non-zero unless you
pass `--confirm`. Bodies are raw JSON matching the NetSapiens v2 schema.
```bash
py ns.py create-domain --body '{"domain":"acme","description":"Acme Inc"}' --confirm
py ns.py create-user acme --body '{"user":"101","name-first-name":"Jane"}' --confirm
py ns.py create-phone acme --body '{...}' --confirm
py ns.py create-did acme --body '{"phonenumber":"15205551234"}' --confirm
py ns.py update-user acme 101 --body '{"name-last-name":"Doe"}' --confirm
py ns.py delete-user acme 101 --confirm
```
## Raw escape hatch (any of the 239 v2 paths)
The named commands cover the common surface; for anything else, hit the path
directly. Non-GET methods still require `--confirm`.
```bash
py ns.py raw GET domains/acme/users/101/answerrules
py ns.py raw POST domains/acme/users --body '{...}' --confirm
```
## Standard provisioning flow (new customer)
1. `create-domain` -> dial plan auto-generates
2. `create-user` per extension
3. `create-phone` per SIP device (MAC-provisioned)
4. `create-did` to attach DIDs and route them to users
5. Log the work back to the Syncro ticket
## Notes
- This is the LIVE production reseller PBX. A bad `create-domain` or
`delete-user` affects real customers — confirm the target domain first with a
read command before any write.
- CDR queries can be large; always pass `--start`/`--end` and a `--limit`.
- Reference detail (auth shapes, full endpoint inventory) lives in
`references/api.md`.
---
name: packetdial
description: "Manage the ACG PacketDial/OITVOIP hosted VoIP via the NetSapiens API (pbx.packetdial.com). List/inspect domains, users, devices, DIDs, resellers; pull CDRs; provision domains/users/SIP/numbers (writes gated --confirm; read-only default). Triggers: packetdial, oitvoip, netsapiens, voip domain/user/extension, provision phone, add did, CDR. Live production PBX."
---
# PacketDial / NetSapiens (OITVOIP) Skill
Standalone CLI client for the NetSapiens SNAPsolution **API v2** that backs
ACG's hosted-VoIP offering through OITVOIP / PacketDial. Read-only by default;
every write (create / update / delete) is gated behind `--confirm`.
## The two hostnames (important)
| Host | What it is | API? |
|---|---|---|
| `voip.packetdial.com` | Customer-facing white-label portal / UC & fax dashboard (e.g. Cascades fax account **28598**). Login-gated UI. | **No** |
| `pbx.packetdial.com` | Reseller PBX platform — NetSapiens v44.4. | **Yes** — this skill targets it |
- API base: `https://pbx.packetdial.com/ns-api/v2`
- Token endpoint: `https://pbx.packetdial.com/ns-api/v2/tokens`
- Live OpenAPI spec: `https://pbx.packetdial.com/ns-api/webroot/openapi/openapi.json`
- Live Swagger UI: `https://pbx.packetdial.com/ns-api/openapi`
- Vendor docs: https://docs.ns-api.com/ (login) and https://voipdocs.io/oitvoip-access-platform-apis
## Credentials — ONE-TIME SETUP (not yet provisioned)
As of this skill's creation **no API key exists yet** — the vault entry
`msp-tools/oitvoip.sops.yaml` is empty/absent, so every command will fail with a
clear "No credentials found" error until you do this once:
1. Log into `pbx.packetdial.com` -> **Admin > API Keys** and create a
reseller-scoped key (prefix `nsr_`). If self-service key creation is not
available, reply to **Darwin Escaro (OITVOIP)** for reseller OAuth client
credentials.
2. Store it in the SOPS vault. Preferred (static bearer key):
```
# msp-tools/oitvoip.sops.yaml
credentials:
api_key: nsr_xxxxxxxxxxxxxxxx
```
Or, for OAuth2 password-grant credentials:
```
credentials:
client_id: <client id>
client_secret: <client secret>
username: <portal user@domain>
password: <portal password>
```
3. That's it — the client auto-detects which shape is present.
The client never hardcodes secrets. Resolution order: `PACKETDIAL_API_KEY` env
-> `PACKETDIAL_CLIENT_ID`+friends env -> vault `credentials.api_key` -> vault
OAuth fields. Env overrides exist for quick testing without touching the vault.
## Running the CLI
This machine's Python launcher is `py` (per identity.json); `python` / `python3`
also work. Run from the scripts dir so the two modules resolve.
```bash
cd C:/claudetools/.claude/skills/packetdial/scripts
py ns.py status # API version + authenticated key identity
py ns.py domains # list all domains
py ns.py domain <domain> # one domain's config
py ns.py users <domain> # users / extensions in a domain
py ns.py user <domain> <user>
py ns.py phones <domain> # SIP devices registered in a domain
py ns.py dids <domain> # phone numbers (DIDs) on a domain
py ns.py devices <domain> <user>
py ns.py cdrs --domain <domain> --start 2026-06-01 --end 2026-06-02
py ns.py resellers
```
## Writes (gated)
Every mutating command prints a `[DRY RUN]` line and exits non-zero unless you
pass `--confirm`. Bodies are raw JSON matching the NetSapiens v2 schema.
```bash
py ns.py create-domain --body '{"domain":"acme","description":"Acme Inc"}' --confirm
py ns.py create-user acme --body '{"user":"101","name-first-name":"Jane"}' --confirm
py ns.py create-phone acme --body '{...}' --confirm
py ns.py create-did acme --body '{"phonenumber":"15205551234"}' --confirm
py ns.py update-user acme 101 --body '{"name-last-name":"Doe"}' --confirm
py ns.py delete-user acme 101 --confirm
```
## Raw escape hatch (any of the 239 v2 paths)
The named commands cover the common surface; for anything else, hit the path
directly. Non-GET methods still require `--confirm`.
```bash
py ns.py raw GET domains/acme/users/101/answerrules
py ns.py raw POST domains/acme/users --body '{...}' --confirm
```
## Standard provisioning flow (new customer)
1. `create-domain` -> dial plan auto-generates
2. `create-user` per extension
3. `create-phone` per SIP device (MAC-provisioned)
4. `create-did` to attach DIDs and route them to users
5. Log the work back to the Syncro ticket
## Notes
- This is the LIVE production reseller PBX. A bad `create-domain` or
`delete-user` affects real customers — confirm the target domain first with a
read command before any write.
- CDR queries can be large; always pass `--start`/`--end` and a `--limit`.
- Reference detail (auth shapes, full endpoint inventory) lives in
`references/api.md`.

View File

@@ -1,65 +1,61 @@
---
name: remediation-tool
description: |
M365 tenant investigation and remediation using the ComputerGuru tiered MSP app suite (5 apps: Security Investigator, Exchange Operator, User Manager, Tenant Admin, Defender Add-on). Auto-invoke when the user says "remediation tool", "365 remediation", "check <user>'s mailbox/box", "credential stuffing" against an M365 user, "breach check" on an M365 tenant, or needs M365 admin API work that client-credentials Graph + Exchange REST can perform. NOT for CIPP — this is the direct Graph API app suite.
Also invoke when the user needs any of: inbox rule enumeration, mailbox forwarding check, delegate/SendAs audit, OAuth consent audit, sign-in log queries, risky user lookup, directory audit queries, B2B guest invite audit against M365.
Triggers: "365 remediation", "remediation tool", "check <user> box/mailbox/account for breach", "credential stuff*", "who's getting attacked", "foreign sign-in", "inbox rule", "mailbox forward*", "oauth consent" (in MSP context), "tenant sweep", "risky user", "hidden rule", Exchange Online admin API, "adminapi/beta/{tenant}/InvokeCommand".
---
# 365 Remediation Tool
Read-only by default. All remediation actions require explicit `YES` confirmation in chat (not a permission prompt).
## App Architecture (Tiered)
Five multi-tenant apps cover distinct privilege tiers. Use only what the task requires.
| Tier | App display name | App ID | Vault file | Scope |
|---|---|---|---|---|
| `investigator` | ComputerGuru Security Investigator | `bfbc12a4-f0dd-4e12-b06d-997e7271e10c` | `computerguru-security-investigator.sops.yaml` | Graph read-only |
| `investigator-exo` | ComputerGuru Security Investigator | `bfbc12a4-f0dd-4e12-b06d-997e7271e10c` | `computerguru-security-investigator.sops.yaml` | Exchange Online read |
| `exchange-op` | ComputerGuru Exchange Operator | `b43e7342-5b4b-492f-890f-bb5a4f7f40e9` | `computerguru-exchange-operator.sops.yaml` | Exchange Online write |
| `user-manager` | ComputerGuru User Manager | `64fac46b-8b44-41ad-93ee-7da03927576c` | `computerguru-user-manager.sops.yaml` | Graph user/group write |
| `tenant-admin` | ComputerGuru Tenant Admin | `709e6eed-0711-4875-9c44-2d3518c47063` | `computerguru-tenant-admin.sops.yaml` | Graph high-privilege |
| `defender` | ComputerGuru Defender Add-on | `dbf8ad1a-54f4-4bb8-8a9e-ea5b9634635b` | `computerguru-defender-addon.sops.yaml` | Defender ATP (MDE only) |
**Default for breach checks:** use `investigator` (Graph) + `investigator-exo` (Exchange read). Escalate to write tiers only when remediating.
## Auto-Invocation Behavior
When triggered automatically (vs. via `/remediation-tool`), follow the same workflow in `.claude/commands/remediation-tool.md`:
1. Parse the user's intent into a subcommand (check/sweep/signins/consent-url/remediate).
2. Resolve tenant ID from domain.
3. Acquire tokens via `get-token.sh <tenant> <tier>` — use lowest-privilege tier needed.
4. Run checks via scripts in `scripts/`.
5. Interpret findings using `references/checklist.md`.
6. Write report to `clients/{slug}/reports/YYYY-MM-DD-{action}.md` using `templates/breach-report.md`.
7. Chat summary + delegate commit to Gitea agent.
## Before calling any script, verify
- The SOPS vault is accessible via `.claude/identity.json` `vault_path` field. The scripts auto-resolve the vault location from identity.json — no hardcoded paths.
- `jq`, `curl`, `bash` are available.
- For Exchange REST checks: confirm the target tenant has **Exchange Administrator** role assigned to the **Security Investigator** SP (for reads) or **Exchange Operator** SP (for writes). If any Exchange REST call returns 403, emit the tenant-scoped Entra Roles link from `references/gotchas.md`.
- For Identity Protection checks: `IdentityRiskyUser.Read.All` is in the Security Investigator manifest AND the tenant has consented to that app. If 403, emit the per-app consent URL from `references/gotchas.md`.
- For Defender checks: confirm tenant has Microsoft Defender for Endpoint (MDE) license before using `defender` tier — it returns AADSTS650052 otherwise.
## Conventions
- **Target identifiers**: accept UPN, domain, or tenant GUID. Normalize to tenant GUID internally.
- **Token tiers**: minimum necessary privilege. Never use `tenant-admin` for a read-only check.
- **Token cache**: `/tmp/remediation-tool/{tenant-id}/{tier}.jwt`. TTL 55 minutes. Check `-mmin -55` before reuse.
- **Raw JSON artifacts**: `/tmp/remediation-tool/{tenant-id}/{check}/` — keep so the user can re-analyze.
- **Reports**: `clients/{slug}/reports/YYYY-MM-DD-{action}.md`. Derive slug from domain (strip TLD, hyphenate).
- **UTC dates everywhere**.
## Scope boundaries
- **Not a replacement for CIPP.** Use CIPP for bulk baseline configuration, templates, standards alerting. Use this tool for focused investigation and point-in-time remediation.
- **Entra app registrations stay manual in the portal** — don't create/modify the multi-tenant apps themselves via the tool.
- **Conditional Access policies CAN be managed programmatically** (Tenant Admin tier holds `Policy.ReadWrite.ConditionalAccess` + the Conditional Access Administrator role). MANDATORY discipline: (1) always create/modify in **report-only** (`state: enabledForReportingButNotEnforced`) first; (2) always **exclude the tenant's break-glass account** (`conditions.users.excludeUsers`); (3) verify impact in Entra sign-in logs before enforcing; (4) get explicit user confirmation before flipping any policy to `enabled` on a tenant with real users. (CA-manual boundary relaxed 2026-05-27 at Mike's direction — break-glass + report-only keep blast radius near zero.)
- **Not for Graph permissions the apps don't have.** If a call 403s and the scope isn't in the relevant app's manifest, stop and tell the user — don't try to work around it.
- **Defender tier requires MDE license.** If the tenant doesn't have MDE, the token request succeeds but API calls return AADSTS650052. Check before using.
---
name: remediation-tool
description: "M365 tenant investigation + remediation via the ComputerGuru MSP app suite (Security Investigator/Exchange Operator/User Manager/Tenant Admin/Defender). Direct Graph+Exchange REST (not CIPP). Triggers: 365 remediation, breach/credential-stuffing check, check a mailbox, inbox rules, mailbox forwarding, delegate/SendAs audit, OAuth consent, sign-in/risky-user lookup, tenant sweep."
---
# 365 Remediation Tool
Read-only by default. All remediation actions require explicit `YES` confirmation in chat (not a permission prompt).
## App Architecture (Tiered)
Five multi-tenant apps cover distinct privilege tiers. Use only what the task requires.
| Tier | App display name | App ID | Vault file | Scope |
|---|---|---|---|---|
| `investigator` | ComputerGuru Security Investigator | `bfbc12a4-f0dd-4e12-b06d-997e7271e10c` | `computerguru-security-investigator.sops.yaml` | Graph read-only |
| `investigator-exo` | ComputerGuru Security Investigator | `bfbc12a4-f0dd-4e12-b06d-997e7271e10c` | `computerguru-security-investigator.sops.yaml` | Exchange Online read |
| `exchange-op` | ComputerGuru Exchange Operator | `b43e7342-5b4b-492f-890f-bb5a4f7f40e9` | `computerguru-exchange-operator.sops.yaml` | Exchange Online write |
| `user-manager` | ComputerGuru User Manager | `64fac46b-8b44-41ad-93ee-7da03927576c` | `computerguru-user-manager.sops.yaml` | Graph user/group write |
| `tenant-admin` | ComputerGuru Tenant Admin | `709e6eed-0711-4875-9c44-2d3518c47063` | `computerguru-tenant-admin.sops.yaml` | Graph high-privilege |
| `defender` | ComputerGuru Defender Add-on | `dbf8ad1a-54f4-4bb8-8a9e-ea5b9634635b` | `computerguru-defender-addon.sops.yaml` | Defender ATP (MDE only) |
**Default for breach checks:** use `investigator` (Graph) + `investigator-exo` (Exchange read). Escalate to write tiers only when remediating.
## Auto-Invocation Behavior
When triggered automatically (vs. via `/remediation-tool`), follow the same workflow in `.claude/commands/remediation-tool.md`:
1. Parse the user's intent into a subcommand (check/sweep/signins/consent-url/remediate).
2. Resolve tenant ID from domain.
3. Acquire tokens via `get-token.sh <tenant> <tier>` — use lowest-privilege tier needed.
4. Run checks via scripts in `scripts/`.
5. Interpret findings using `references/checklist.md`.
6. Write report to `clients/{slug}/reports/YYYY-MM-DD-{action}.md` using `templates/breach-report.md`.
7. Chat summary + delegate commit to Gitea agent.
## Before calling any script, verify
- The SOPS vault is accessible via `.claude/identity.json` `vault_path` field. The scripts auto-resolve the vault location from identity.json — no hardcoded paths.
- `jq`, `curl`, `bash` are available.
- For Exchange REST checks: confirm the target tenant has **Exchange Administrator** role assigned to the **Security Investigator** SP (for reads) or **Exchange Operator** SP (for writes). If any Exchange REST call returns 403, emit the tenant-scoped Entra Roles link from `references/gotchas.md`.
- For Identity Protection checks: `IdentityRiskyUser.Read.All` is in the Security Investigator manifest AND the tenant has consented to that app. If 403, emit the per-app consent URL from `references/gotchas.md`.
- For Defender checks: confirm tenant has Microsoft Defender for Endpoint (MDE) license before using `defender` tier — it returns AADSTS650052 otherwise.
## Conventions
- **Target identifiers**: accept UPN, domain, or tenant GUID. Normalize to tenant GUID internally.
- **Token tiers**: minimum necessary privilege. Never use `tenant-admin` for a read-only check.
- **Token cache**: `/tmp/remediation-tool/{tenant-id}/{tier}.jwt`. TTL 55 minutes. Check `-mmin -55` before reuse.
- **Raw JSON artifacts**: `/tmp/remediation-tool/{tenant-id}/{check}/` — keep so the user can re-analyze.
- **Reports**: `clients/{slug}/reports/YYYY-MM-DD-{action}.md`. Derive slug from domain (strip TLD, hyphenate).
- **UTC dates everywhere**.
## Scope boundaries
- **Not a replacement for CIPP.** Use CIPP for bulk baseline configuration, templates, standards alerting. Use this tool for focused investigation and point-in-time remediation.
- **Entra app registrations stay manual in the portal** — don't create/modify the multi-tenant apps themselves via the tool.
- **Conditional Access policies CAN be managed programmatically** (Tenant Admin tier holds `Policy.ReadWrite.ConditionalAccess` + the Conditional Access Administrator role). MANDATORY discipline: (1) always create/modify in **report-only** (`state: enabledForReportingButNotEnforced`) first; (2) always **exclude the tenant's break-glass account** (`conditions.users.excludeUsers`); (3) verify impact in Entra sign-in logs before enforcing; (4) get explicit user confirmation before flipping any policy to `enabled` on a tenant with real users. (CA-manual boundary relaxed 2026-05-27 at Mike's direction — break-glass + report-only keep blast radius near zero.)
- **Not for Graph permissions the apps don't have.** If a call 403s and the scope isn't in the relevant app's manifest, stop and tell the user — don't try to work around it.
- **Defender tier requires MDE license.** If the tenant doesn't have MDE, the token request succeeds but API calls return AADSTS650052. Check before using.

View File

@@ -1,161 +1,151 @@
---
name: self-check
description: >-
Self-diagnose a ClaudeTools session's machine: verify the harness is wired the
same way as every other instance while allowing for architectural / OS / hardware
differences. Checks that identity.json exists and is correct (the map of WHERE
things live on this box), required tooling is installed, env/paths resolve,
hooks are wired, the skill/command/script set matches the baseline, the vault
decrypts, coord/Gitea are reachable, and the machine's capability tier (e.g. no
local Ollama) resolves to the right fallback ruleset. Grades RED/AMBER/GREEN and
can publish a census to the coord API so the fleet baseline can be built/refined.
Invoke for: "self check", "self diagnosis", "self test", "doctor", "health check",
"am I configured right", "is my machine set up correctly", "harness conformance",
"fleet conformance", "check my environment", "is everything wired up".
---
# Self-Check — ClaudeTools Harness Self-Diagnosis
A top-to-bottom evaluation of how *this* machine's ClaudeTools harness is wired,
graded against a checked-in **baseline manifest** so every machine behaves the
same way — while explicitly allowing for architecture, OS, and hardware
differences via a **capability tier** model.
This is the skill the user asked for when a session needs to "make sure
everything is as it should be."
## The model in one paragraph
`identity.json` is the foundational, per-machine map of **where things live and
what this box can do** (vault path, repo root, platform, arch, python command,
Ollama endpoints). The **baseline manifest**
(`baseline/manifest.json`) declares what *every* machine must have — required
tools, identity fields, scripts, hook files, the wired `settings.json` hooks, the
canonical skill/command set, and the **capability rules** that say what to do when
a capability is absent (e.g. no local Ollama → use the remote endpoint, or if that
is also down, route Tier-0 work to haiku instead of blocking). The probe compares
the live machine against the manifest, resolves the machine's capability tier, and
grades RED/AMBER/GREEN. Required things missing = RED. Advisory drift = AMBER.
Capability differences are **never** failures — they select a ruleset.
## V1 is a CENSUS tool (read this)
There is no ratified fleet baseline yet. `baseline/manifest.json` is **provisional**,
generated from a single known-good machine (GURU-5070). So V1's job is to gather
ground truth from every machine and help Mike build the real baseline:
1. **Probe** — each machine runs the check and produces a structured census.
2. **Publish**`--publish` PUTs the census to coord as component
`selfcheck_<host>` (state = grade, notes = full JSON). One row per machine =
a live fleet conformance view.
3. **Fan out**`fanout` broadcasts a request to `ALL_SESSIONS` so every active
instance reports.
4. **Aggregate**`aggregate` reads all censuses back and proposes a baseline
(tools/skills/commands present on *all* machines = "required everywhere";
present on *some* = "capability-gated"), and lists machines with FAILs.
Mike reviews the aggregate and ratifies `manifest.json`. From then on the same
probe enforces conformance. **V1 does not auto-fix anything** — it reports the
exact fix command for each finding (per the decision on record).
## Running it
The probe is `scripts/self-check.sh` (bash; runs on Git Bash/Windows, macOS,
Linux; deps: jq + curl). Always pass a real UTC timestamp:
```bash
SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
bash .claude/skills/self-check/scripts/self-check.sh <mode>
```
| Mode | Purpose |
|------|---------|
| `report` (default) | Human RED/AMBER/GREEN report. Exit 0/1/2 = GREEN/AMBER/RED. |
| `--json` | Structured census JSON to stdout (for piping). |
| `--publish` | Run + publish census to coord (component `selfcheck_<host>`). Softfails to `.claude/coord-queue.jsonl`. |
| `fanout` | Broadcast a census request to ALL_SESSIONS. |
| `aggregate` | Fleet table + proposed-baseline summary from published censuses. |
`/self-check` is the slash-command runner for the same script.
## What it checks
| Category | Checks |
|----------|--------|
| **identity** | identity.json exists + valid JSON; **all required fields present**; `claudetools_root` exists and equals the running repo; `vault_path` exists; `machine` == hostname; git user.name/email match identity. |
| **tooling** | required everywhere: bash, git, jq, curl, sops, age, ssh, a python. Missing = FAIL. |
| **capability** | ollama, cargo, node, gh, docker, op — presence is INFO, never a failure. Resolves the **Ollama tier** (local / remote / none) and prints the effective Tier-0 ruleset. |
| **files** | required scripts + hook files present and executable. |
| **hooks** | the three `settings.json` hooks are wired (block-backslash PreToolUse, check-messages UserPromptSubmit, sync-memory SessionStart); `current-mode` present. |
| **git** | origin points at ACG Gitea (internal IP preferred); main-repo post-commit hook installed (AMBER if not). |
| **skills/commands** | every skill dir and command file in the baseline is present; extras are reported as census candidates. |
| **duplicates** | command/skill names present in BOTH the repo and `~/.claude`. Divergent content = WARN (the "same `/cmd`, different behaviour on the Mac" bug); identical = INFO (redundant, will drift). CRLF-only differences are ignored. |
| **memory** | `MEMORY.md` index exists; no orphaned memory files; manifest-declared contradiction patterns (see semantic pass below). Never FAILs the grade. |
| **vault** | vault repo exists; sops+age present; `vault.sh list` succeeds (decrypt wired). |
| **connectivity** | coord API (required), main API + internal Gitea (advisory; off-network is OK). |
## Rogue-memory contradiction — semantic pass (do this when asked, or on a full check)
The engine's memory check is deterministic and conservative (index + orphans +
declared patterns) so it never produces false alarms. A *true* contradiction
check — "does any memory directly contradict what this machine's settings say?"
— is a judgment task, so the model does it (route the prose/classification to
Ollama Tier-0 per the house rules; Claude reviews the result):
1. Read `identity.json` (where things live + this box's capabilities),
`settings.json` (wired hooks/permissions), and `baseline/manifest.json`.
2. Read the memory index `.claude/memory/MEMORY.md`, then open any memory whose
one-line hook touches: paths/roots, python launcher, endpoints/IPs, OS/arch
assumptions, tool choices, or model routing.
3. Flag memories that **directly contradict** this machine's reality, e.g.:
- prescribes `python3`/`python` when `identity.python.command` is `py` (or vice-versa),
- hardcodes a repo/vault path that isn't this machine's `claudetools_root`/`vault_path`,
- names an endpoint/IP that conflicts with `identity.coord_api` or the manifest,
- assumes a capability (local Ollama) this machine's tier says is absent.
4. Report each as: memory file, the contradicting claim, the setting it violates,
and a suggested correction. **Do not edit memories** — surface for the operator
(deletions/rewrites go through the human, mirroring memory-dream's posture).
Genuinely machine-specific guidance in a *shared* memory is the usual culprit —
the fix is to scope it ("on Windows…") or split it, not to globally flip it.
## Fleet self-remediation loop (machines fix themselves)
We never fix a remote machine. The flow is:
1. `fanout` — broadcast asks every instance to self-check + self-fix + re-publish.
2. Each operator runs `/self-check` locally, applies the printed fix commands on
their own box, re-runs to confirm GREEN, then `/self-check --publish`.
3. `aggregate` — shows who is still RED/AMBER and prints each machine's own fix
list. Relay it to that operator; do not run it for them.
4. Repeat until the fleet is consistently GREEN, then ratify the manifest.
## How to interpret a run
After running, summarize for the user:
- The **grade** and the PASS/WARN/FAIL/INFO tallies.
- Each **FAIL** and **WARN** with its exact fix command. Do not auto-apply.
- The **capability tier** line — confirm the machine knows its Tier-0 fallback.
- If publishing/aggregating, note how many machines have reported and which are RED.
Capability differences (no Ollama, no gh, ARM vs amd64, macOS vs Windows) are
expected and must never be reported as broken — they are the whole point of the
tier model.
## Files
```
.claude/skills/self-check/
SKILL.md this file
scripts/self-check.sh the probe engine (report / --json / --publish / fanout / aggregate)
baseline/manifest.json the provisional fleet baseline (single source of truth)
baseline/README.md the baseline model + how to refine/ratify it
.claude/commands/self-check.md the /self-check runner
```
## Extending the baseline
When a new tool/skill/command/hook becomes mandatory fleet-wide, edit
`baseline/manifest.json`, commit, and `/sync`. Every machine's next self-check
enforces it. Capability-only tools go in `capability_tools` with a matching entry
in `capability_rules` describing the fallback. See `baseline/README.md`.
---
name: self-check
description: "Self-diagnose this machine's harness conformance vs the fleet baseline: identity.json, tooling, env/paths, hooks, skill/command/script set, vault decrypt, coord/Gitea reachability, capability tier. Grades RED/AMBER/GREEN; can publish a census. Triggers: self check/test, doctor, health check, am I configured right, harness/fleet conformance."
---
# Self-Check — ClaudeTools Harness Self-Diagnosis
A top-to-bottom evaluation of how *this* machine's ClaudeTools harness is wired,
graded against a checked-in **baseline manifest** so every machine behaves the
same way — while explicitly allowing for architecture, OS, and hardware
differences via a **capability tier** model.
This is the skill the user asked for when a session needs to "make sure
everything is as it should be."
## The model in one paragraph
`identity.json` is the foundational, per-machine map of **where things live and
what this box can do** (vault path, repo root, platform, arch, python command,
Ollama endpoints). The **baseline manifest**
(`baseline/manifest.json`) declares what *every* machine must have — required
tools, identity fields, scripts, hook files, the wired `settings.json` hooks, the
canonical skill/command set, and the **capability rules** that say what to do when
a capability is absent (e.g. no local Ollama → use the remote endpoint, or if that
is also down, route Tier-0 work to haiku instead of blocking). The probe compares
the live machine against the manifest, resolves the machine's capability tier, and
grades RED/AMBER/GREEN. Required things missing = RED. Advisory drift = AMBER.
Capability differences are **never** failures — they select a ruleset.
## V1 is a CENSUS tool (read this)
There is no ratified fleet baseline yet. `baseline/manifest.json` is **provisional**,
generated from a single known-good machine (GURU-5070). So V1's job is to gather
ground truth from every machine and help Mike build the real baseline:
1. **Probe** — each machine runs the check and produces a structured census.
2. **Publish**`--publish` PUTs the census to coord as component
`selfcheck_<host>` (state = grade, notes = full JSON). One row per machine =
a live fleet conformance view.
3. **Fan out**`fanout` broadcasts a request to `ALL_SESSIONS` so every active
instance reports.
4. **Aggregate**`aggregate` reads all censuses back and proposes a baseline
(tools/skills/commands present on *all* machines = "required everywhere";
present on *some* = "capability-gated"), and lists machines with FAILs.
Mike reviews the aggregate and ratifies `manifest.json`. From then on the same
probe enforces conformance. **V1 does not auto-fix anything** — it reports the
exact fix command for each finding (per the decision on record).
## Running it
The probe is `scripts/self-check.sh` (bash; runs on Git Bash/Windows, macOS,
Linux; deps: jq + curl). Always pass a real UTC timestamp:
```bash
SELFCHECK_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
bash .claude/skills/self-check/scripts/self-check.sh <mode>
```
| Mode | Purpose |
|------|---------|
| `report` (default) | Human RED/AMBER/GREEN report. Exit 0/1/2 = GREEN/AMBER/RED. |
| `--json` | Structured census JSON to stdout (for piping). |
| `--publish` | Run + publish census to coord (component `selfcheck_<host>`). Softfails to `.claude/coord-queue.jsonl`. |
| `fanout` | Broadcast a census request to ALL_SESSIONS. |
| `aggregate` | Fleet table + proposed-baseline summary from published censuses. |
`/self-check` is the slash-command runner for the same script.
## What it checks
| Category | Checks |
|----------|--------|
| **identity** | identity.json exists + valid JSON; **all required fields present**; `claudetools_root` exists and equals the running repo; `vault_path` exists; `machine` == hostname; git user.name/email match identity. |
| **tooling** | required everywhere: bash, git, jq, curl, sops, age, ssh, a python. Missing = FAIL. |
| **capability** | ollama, cargo, node, gh, docker, op — presence is INFO, never a failure. Resolves the **Ollama tier** (local / remote / none) and prints the effective Tier-0 ruleset. |
| **files** | required scripts + hook files present and executable. |
| **hooks** | the three `settings.json` hooks are wired (block-backslash PreToolUse, check-messages UserPromptSubmit, sync-memory SessionStart); `current-mode` present. |
| **git** | origin points at ACG Gitea (internal IP preferred); main-repo post-commit hook installed (AMBER if not). |
| **skills/commands** | every skill dir and command file in the baseline is present; extras are reported as census candidates. |
| **duplicates** | command/skill names present in BOTH the repo and `~/.claude`. Divergent content = WARN (the "same `/cmd`, different behaviour on the Mac" bug); identical = INFO (redundant, will drift). CRLF-only differences are ignored. |
| **memory** | `MEMORY.md` index exists; no orphaned memory files; manifest-declared contradiction patterns (see semantic pass below). Never FAILs the grade. |
| **vault** | vault repo exists; sops+age present; `vault.sh list` succeeds (decrypt wired). |
| **connectivity** | coord API (required), main API + internal Gitea (advisory; off-network is OK). |
## Rogue-memory contradiction — semantic pass (do this when asked, or on a full check)
The engine's memory check is deterministic and conservative (index + orphans +
declared patterns) so it never produces false alarms. A *true* contradiction
check — "does any memory directly contradict what this machine's settings say?"
— is a judgment task, so the model does it (route the prose/classification to
Ollama Tier-0 per the house rules; Claude reviews the result):
1. Read `identity.json` (where things live + this box's capabilities),
`settings.json` (wired hooks/permissions), and `baseline/manifest.json`.
2. Read the memory index `.claude/memory/MEMORY.md`, then open any memory whose
one-line hook touches: paths/roots, python launcher, endpoints/IPs, OS/arch
assumptions, tool choices, or model routing.
3. Flag memories that **directly contradict** this machine's reality, e.g.:
- prescribes `python3`/`python` when `identity.python.command` is `py` (or vice-versa),
- hardcodes a repo/vault path that isn't this machine's `claudetools_root`/`vault_path`,
- names an endpoint/IP that conflicts with `identity.coord_api` or the manifest,
- assumes a capability (local Ollama) this machine's tier says is absent.
4. Report each as: memory file, the contradicting claim, the setting it violates,
and a suggested correction. **Do not edit memories** — surface for the operator
(deletions/rewrites go through the human, mirroring memory-dream's posture).
Genuinely machine-specific guidance in a *shared* memory is the usual culprit —
the fix is to scope it ("on Windows…") or split it, not to globally flip it.
## Fleet self-remediation loop (machines fix themselves)
We never fix a remote machine. The flow is:
1. `fanout` — broadcast asks every instance to self-check + self-fix + re-publish.
2. Each operator runs `/self-check` locally, applies the printed fix commands on
their own box, re-runs to confirm GREEN, then `/self-check --publish`.
3. `aggregate` — shows who is still RED/AMBER and prints each machine's own fix
list. Relay it to that operator; do not run it for them.
4. Repeat until the fleet is consistently GREEN, then ratify the manifest.
## How to interpret a run
After running, summarize for the user:
- The **grade** and the PASS/WARN/FAIL/INFO tallies.
- Each **FAIL** and **WARN** with its exact fix command. Do not auto-apply.
- The **capability tier** line — confirm the machine knows its Tier-0 fallback.
- If publishing/aggregating, note how many machines have reported and which are RED.
Capability differences (no Ollama, no gh, ARM vs amd64, macOS vs Windows) are
expected and must never be reported as broken — they are the whole point of the
tier model.
## Files
```
.claude/skills/self-check/
SKILL.md this file
scripts/self-check.sh the probe engine (report / --json / --publish / fanout / aggregate)
baseline/manifest.json the provisional fleet baseline (single source of truth)
baseline/README.md the baseline model + how to refine/ratify it
.claude/commands/self-check.md the /self-check runner
```
## Extending the baseline
When a new tool/skill/command/hook becomes mandatory fleet-wide, edit
`baseline/manifest.json`, commit, and `/sync`. Every machine's next self-check
enforces it. Capability-only tools go in `capability_tools` with a matching entry
in `capability_rules` describing the fallback. See `baseline/README.md`.