claudetools/.agents/skills/impeccable/reference/critique.md at 93ae7dbaaa104f00aa2ffd8faa031824453aa93c

Files

Mike Swanson e80c36e6bf sync: auto-sync from DESKTOP-0O8A1RL at 2026-05-22 11:07:55

Author: Mike Swanson
Machine: DESKTOP-0O8A1RL
Timestamp: 2026-05-22 11:07:55

2026-05-22 11:07:59 -07:00

17 KiB

Raw Blame History

Purpose

Resolve one stable target, run two independent assessments, synthesize a design critique, persist a snapshot, and ask the user what to improve next. The chat response is the primary deliverable; the snapshot is an archive/backlog for future commands.

Hard Invariants

Assessment A (design review) and Assessment B (detector/browser evidence) are both required.
Assessment A must finish before detector findings enter the parent synthesis context. Detector output is deterministic, but it still anchors judgment.
If sub-agents are unavailable, fall back sequentially: finish and record Assessment A first, then run Assessment B, then synthesize.
A skipped detector is a failed critique run unless detect.mjs is missing or crashes after a real attempt.
Viewable targets require browser inspection when available.
Any local server started only for critique visualization must run in the background, have a recorded stop method, and be stopped before final reporting unless the user asks to keep it.
Do not claim a user-visible overlay exists unless script injection succeeded and the detector ran in the page.

Setup

Resolve the target to a concrete file path or URL. Prefer a source path over a dev-server URL when both identify the same surface; ports drift, paths do not.
- "the homepage" -> site/pages/index.astro or index.html
- "the settings modal" -> the primary component file
- "this page" -> the current URL or source file
Compute the slug:
```
node {{scripts_path}}/critique-storage.mjs slug "<resolved-path-or-url>"
```
Keep it. If the command exits non-zero, skip persistence and trend for this run, but continue the critique.
Read .impeccable/critique/ignore.md if it exists. Drop matching findings silently; it is the only prior-run input critique consumes.

Assessment Orchestration

Delegate Assessment A and Assessment B to separate sub-agents when possible. They must not see each other's output. Do not show findings to the user until synthesis.

Codex sub-agent gate: - If `spawn_agent` is exposed and the user explicitly allowed sub-agents, delegation, or parallel agent work, spawn A and B immediately. - If `spawn_agent` is exposed but the user did not explicitly allow sub-agents, ask exactly once: "Impeccable critique is designed to run two independent sub-agents for an unanchored assessment. May I use sub-agents for this critique?" Then stop until the user answers. - If allowed, spawn A and B. If declined, run sequentially and report `Assessment independence: degraded (sub-agents declined by user)`. - If `spawn_agent` is not exposed, do not ask; run sequentially and report `Assessment independence: degraded (spawn_agent unavailable in this session)`. - If spawning fails after permission, run sequentially and report `Assessment independence: degraded (sub-agent spawn failed: )`. Prefer `fork_context: false` with self-contained prompts containing cwd, target, live URL, references, product context, and output contract. If using `fork_context: true`, omit `agent_type`, `model`, and `reasoning_effort`.

If browser automation is available, each assessment creates its own new tab. Never reuse an existing tab, even if it is already at the right URL.

Assessment A: Design Review

Read relevant source files and visually inspect the live page when browser automation is available. Think like a design director.

Evaluate:

AI slop: Would someone believe "AI made this" immediately? Check all DON'T guidance from the parent Impeccable skill.
Holistic design: hierarchy, IA, emotional fit, discoverability, composition, typography, color, accessibility, states, copy, and edge cases.
Cognitive load: consult cognitive-load; report checklist failures and decision points with >4 visible options.
Emotional journey: peak-end rule, emotional valleys, reassurance at high-stakes moments.
Nielsen heuristics: consult heuristics-scoring; score all 10 heuristics 0-4.

Return: AI slop verdict, heuristic scores, cognitive load, emotional journey, 2-3 strengths, 3-5 priority issues, persona red flags, minor observations, and provocative questions.

Assessment B: Detector + Browser Evidence

Run the bundled detector and browser visualization evidence. Assessment B is mandatory and must remain isolated from Assessment A until both are complete.

CLI scan:

node {{scripts_path}}/detect.mjs --json [--fast] [target]

Pass markup files/directories as [target]; do not pass CSS-only files.
For URLs, skip CLI scan and use browser visualization.
For 200+ scannable files, use --fast; for 500+, narrow scope or ask.
Exit code 0 = clean; 2 = findings.
If the detector entrypoint is missing or fails to load, report deterministic scan unavailable and continue with browser/manual review.

Browser visualization is required for a viewable target when browser automation is available. Use a localhost dev/static URL for local files; avoid file:// unless the available browser explicitly supports this workflow. Overlay flow:

Create a fresh tab and navigate.
Preflight mutable injection by setting document.title and appending a <script> tag. Read-only evaluate APIs do not count.
If mutation is unavailable, skip live server, browser presentation, and injection; report fallback signal.
If mutation is available, start node {{scripts_path}}/live-server.mjs --background, present the browser if supported, label [Human], scroll top, inject http://localhost:PORT/detect.js, wait 2-3 seconds, read impeccable console messages, then stop the live server.
For multi-view targets, inject on 3-5 representative pages.

Codex Browser note: Use the Browser skill. Do not spend a Browser attempt on `file://`. Only call `visibility.set(true)` after mutable script injection is confirmed for the `[Human]` overlay path; verify with `get()`. Use `tab.dev.logs({ filter: "impeccable" })` for console results. Its Playwright `evaluate(...)` surface is read-only; do not rely on it for mutation.

Return: CLI findings JSON/counts, browser console findings if applicable, false positives, and skipped/failed browser steps with concrete reasons.

After Assessment B returns usable CLI findings, reuse them. Do not rerun detect.mjs in the parent unless Assessment B failed, was truncated, or omitted count, rule names, or file locations.

Codex failure accounting: final Run Notes must include target slug, ignore list, assessment independence, CLI detector, browser visibility, overlay injection, live-server cleanup, temp-file cleanup, and any fallback signal used. Do not run repo status checks, late API spelunking, or unrelated verification after the report is assembled.

Generate Combined Critique Report

Synthesize both assessments into a single report. Do NOT simply concatenate. Weave the findings together, noting where the LLM review and detector agree, where the detector caught issues the LLM missed, and where detector findings are false positives.

The chat response is the primary user-facing deliverable. Present the full structured critique below in chat; do not replace it with a summary and a link. The persisted snapshot is only an archive/backlog for later commands.

Codex final-answer note: `$impeccable critique` produces a report artifact, so the final chat response should intentionally exceed the usual concise close-out style. Do not title the final response "Critique Summary" unless the user explicitly asked for a summary.

Structure your feedback as a design director would:

Design Health Score

Consult heuristics-scoring

Present the Nielsen's 10 heuristics scores as a table:

#	Heuristic	Score	Key Issue
1	Visibility of System Status	?	[specific finding or "n/a" if solid]
2	Match System / Real World	?
3	User Control and Freedom	?
4	Consistency and Standards	?
5	Error Prevention	?
6	Recognition Rather Than Recall	?
7	Flexibility and Efficiency	?
8	Aesthetic and Minimalist Design	?
9	Error Recovery	?
10	Help and Documentation	?
Total		??/40	[Rating band]

Be honest with scores. A 4 means genuinely excellent. Most real interfaces score 20-32.

Anti-Patterns Verdict

Start here. Does this look AI-generated?

LLM assessment: Your own evaluation of AI slop tells. Cover overall aesthetic feel, layout sameness, generic composition, missed opportunities for personality.

Deterministic scan: Summarize what the automated detector found, with counts and file locations. Note any additional issues the detector caught that you missed, and flag any false positives.

Visual overlays (if injection succeeded): Tell the user that overlays are now visible in the [Human] tab in their browser, highlighting the detected issues. Summarize what the console output reported. If browser visualization was attempted but injection failed, say that no reliable user-visible overlay is available and report the fallback signal instead.

Overall Impression

A brief gut reaction: what works, what doesn't, and the single biggest opportunity.

What's Working

Highlight 2-3 things done well. Be specific about why they work.

Priority Issues

The 3-5 most impactful design problems, ordered by importance.

For each issue, tag with P0-P3 severity (consult heuristics-scoring for severity definitions):

[P?] What: Name the problem clearly
Why it matters: How this hurts users or undermines goals
Fix: What to do about it (be concrete)
Suggested command: Which command could address this (from: {{available_commands}})

Persona Red Flags

Consult personas

Auto-select 2-3 personas most relevant to this interface type (use the selection table in the reference). If {{config_file}} contains a ## Design Context section from impeccable teach, also generate 1-2 project-specific personas from the audience/brand info.

For each selected persona, walk through the primary user action and list specific red flags found:

Alex (Power User): No keyboard shortcuts detected. Form requires 8 clicks for primary action. Forced modal onboarding. High abandonment risk.

Jordan (First-Timer): Icon-only nav in sidebar. Technical jargon in error messages ("404 Not Found"). No visible help. Will abandon at step 2.

Be specific. Name the exact elements and interactions that fail each persona. Don't write generic persona descriptions; write what broke for them.

Minor Observations

Quick notes on smaller issues worth addressing.

Questions to Consider

Provocative questions that might unlock better solutions:

"What if the primary action were more prominent?"
"Does this need to feel this complex?"
"What would a confident version of this look like?"

#### Run Notes Keep this compact. Include status for target slug, ignore list, assessment independence, CLI detector, browser visibility, overlay injection, live server cleanup, and temp-file cleanup. For failed or skipped steps, give the concrete observed reason and the fallback signal used. In the final chat response, also include snapshot write and trend read status after persistence has run.

Codex Run Notes are final-chat only. Do not include this section in the persisted snapshot body, because persistence, trend read, and temp cleanup happen after the snapshot write and would otherwise archive stale status such as "pending after persistence."

Remember:

Be direct. Vague feedback wastes everyone's time.
Be specific. "The submit button," not "some elements."
Say what's wrong AND why it matters to users.
Give concrete suggestions. Cut "consider exploring..." entirely.
Prioritize ruthlessly. If everything is important, nothing is.
Don't soften criticism. Developers need honest feedback to ship great design.

Persist the Snapshot

Once the report above is finalized, write it to .impeccable/critique/ so the user can refer back, and so {{command_prefix}}impeccable polish can pick up the priority issues without a copy-paste.

Skip this step if the Setup slug was null (vague or root-level target).

Write the body to a temp file so you can pipe it to the helper. Use the full critique report (heuristic table, anti-patterns verdict, priority issues, persona red flags, minor observations, and questions), but stop before the "Ask the User" / "Recommended Actions" sections that come later.
Codex: exclude Run Notes from the temp body file; Run Notes are final-chat only because persistence, trend read, and temp cleanup happen after the snapshot write.

Pass the structured metadata through IMPECCABLE_CRITIQUE_META (JSON), then run the write command:

IMPECCABLE_CRITIQUE_META='{"target":"<user phrasing>","total_score":<n>,"p0_count":<n>,"p1_count":<n>}' \
  node {{scripts_path}}/critique-storage.mjs write <slug> <body-file>

The helper prints the absolute path it wrote.

Delete the temp body file after the write attempt completes, whether the write succeeded or failed. If deletion fails, mention temp-file cleanup failed: <reason> briefly in the final output, but do not block the critique.
Read the trend for context:
```
node {{scripts_path}}/critique-storage.mjs trend <slug> 5
```
This returns a JSON array of the last 5 frontmatter entries (including the one you just wrote).
Append a single line to the user-visible output, after the report and before the questions:

Trend for <slug> (last 5 runs): 24 → 28 → 32 → 29 → 32 Wrote .impeccable/critique/<filename>.

If this is the first run for the slug, the trend is just one score; say so: "First run for this target, no trend yet."

This is fire-and-forget. Do not show the user the helper's JSON output; only the human-readable trend line and the written path. Failures here should not block the rest of the flow; print the error and move on.

Ask the User

After presenting findings, use targeted questions based on what was actually found. {{ask_instruction}} These answers will shape the action plan.

Ask questions along these lines (adapt to the specific findings; do NOT ask generic questions):

Priority direction: Based on the issues found, ask which category matters most to the user right now. For example: "I found problems with visual hierarchy, color usage, and information overload. Which area should we tackle first?" Offer the top 2-3 issue categories as options.
Design intent: If the critique found a tonal mismatch, ask whether it was intentional. For example: "The interface feels clinical and corporate. Is that the intended tone, or should it feel warmer/bolder/more playful?" Offer 2-3 tonal directions as options based on what would fix the issues found.
Scope: Ask how much the user wants to take on. For example: "I found N issues. Want to address everything, or focus on the top 3?" Offer scope options like "Top 3 only", "All issues", "Critical issues only".
Constraints (optional; only ask if relevant): If the findings touch many areas, ask if anything is off-limits. For example: "Should any sections stay as-is?" This prevents the plan from touching things the user considers done.

Rules for questions:

Every question must reference specific findings from the report. Never ask generic "who is your audience?" questions.
Keep it to 2-4 questions maximum. Respect the user's time.
Offer concrete options, not open-ended prompts.
If findings are straightforward (e.g., only 1-2 clear issues), skip questions and go directly to Recommended Actions.

Codex final-question gate: The user-visible response must either include the targeted questions or explicitly say `Questions skipped: ` because the findings were straightforward. Each question must include 2-3 concrete answer options tied to the actual critique findings. Do not end with only open-ended questions.

Recommended Actions

After receiving the user's answers, present a prioritized action summary reflecting the user's priorities and scope from Ask the User.

Action Summary

List recommended commands in priority order, based on the user's answers:

{{command_prefix}}command-name: Brief description of what to fix (specific context from critique findings)
{{command_prefix}}command-name: Brief description (specific context) ...

Rules for recommendations:

Only recommend commands from: {{available_commands}}
Order by the user's stated priorities first, then by impact
Each item's description should carry enough context that the command knows what to focus on
Map each Priority Issue to the appropriate command
Skip commands that would address zero issues
If the user chose a limited scope, only include items within that scope
If the user marked areas as off-limits, exclude commands that would touch those areas
End with {{command_prefix}}impeccable polish as the final step if any fixes were recommended

After presenting the summary, tell the user:

You can ask me to run these one at a time, all at once, or in any order you prefer.

Re-run {{command_prefix}}impeccable critique after fixes to see your score improve.

17 KiB Raw Blame History