Author: Mike Swanson Machine: DESKTOP-0O8A1RL Timestamp: 2026-05-22 11:07:55
17 KiB
Purpose
Resolve one stable target, run two independent assessments, synthesize a design critique, persist a snapshot, and ask the user what to improve next. The chat response is the primary deliverable; the snapshot is an archive/backlog for future commands.
Hard Invariants
- Assessment A (design review) and Assessment B (detector/browser evidence) are both required.
- Assessment A must finish before detector findings enter the parent synthesis context. Detector output is deterministic, but it still anchors judgment.
- If sub-agents are unavailable, fall back sequentially: finish and record Assessment A first, then run Assessment B, then synthesize.
- A skipped detector is a failed critique run unless
detect.mjsis missing or crashes after a real attempt. - Viewable targets require browser inspection when available.
- Any local server started only for critique visualization must run in the background, have a recorded stop method, and be stopped before final reporting unless the user asks to keep it.
- Do not claim a user-visible overlay exists unless script injection succeeded and the detector ran in the page.
Setup
- Resolve the target to a concrete file path or URL. Prefer a source path over a dev-server URL when both identify the same surface; ports drift, paths do not.
- "the homepage" ->
site/pages/index.astroorindex.html - "the settings modal" -> the primary component file
- "this page" -> the current URL or source file
- "the homepage" ->
- Compute the slug:
Keep it. If the command exits non-zero, skip persistence and trend for this run, but continue the critique.
node {{scripts_path}}/critique-storage.mjs slug "<resolved-path-or-url>" - Read
.impeccable/critique/ignore.mdif it exists. Drop matching findings silently; it is the only prior-run input critique consumes.
Assessment Orchestration
Delegate Assessment A and Assessment B to separate sub-agents when possible. They must not see each other's output. Do not show findings to the user until synthesis.
Codex sub-agent gate: - If `spawn_agent` is exposed and the user explicitly allowed sub-agents, delegation, or parallel agent work, spawn A and B immediately. - If `spawn_agent` is exposed but the user did not explicitly allow sub-agents, ask exactly once: "Impeccable critique is designed to run two independent sub-agents for an unanchored assessment. May I use sub-agents for this critique?" Then stop until the user answers. - If allowed, spawn A and B. If declined, run sequentially and report `Assessment independence: degraded (sub-agents declined by user)`. - If `spawn_agent` is not exposed, do not ask; run sequentially and report `Assessment independence: degraded (spawn_agent unavailable in this session)`. - If spawning fails after permission, run sequentially and report `Assessment independence: degraded (sub-agent spawn failed: )`. Prefer `fork_context: false` with self-contained prompts containing cwd, target, live URL, references, product context, and output contract. If using `fork_context: true`, omit `agent_type`, `model`, and `reasoning_effort`.If browser automation is available, each assessment creates its own new tab. Never reuse an existing tab, even if it is already at the right URL.
Assessment A: Design Review
Read relevant source files and visually inspect the live page when browser automation is available. Think like a design director.
Evaluate:
- AI slop: Would someone believe "AI made this" immediately? Check all DON'T guidance from the parent Impeccable skill.
- Holistic design: hierarchy, IA, emotional fit, discoverability, composition, typography, color, accessibility, states, copy, and edge cases.
- Cognitive load: consult cognitive-load; report checklist failures and decision points with >4 visible options.
- Emotional journey: peak-end rule, emotional valleys, reassurance at high-stakes moments.
- Nielsen heuristics: consult heuristics-scoring; score all 10 heuristics 0-4.
Return: AI slop verdict, heuristic scores, cognitive load, emotional journey, 2-3 strengths, 3-5 priority issues, persona red flags, minor observations, and provocative questions.
Assessment B: Detector + Browser Evidence
Run the bundled detector and browser visualization evidence. Assessment B is mandatory and must remain isolated from Assessment A until both are complete.
CLI scan:
node {{scripts_path}}/detect.mjs --json [--fast] [target]
- Pass markup files/directories as
[target]; do not pass CSS-only files. - For URLs, skip CLI scan and use browser visualization.
- For 200+ scannable files, use
--fast; for 500+, narrow scope or ask. - Exit code 0 = clean; 2 = findings.
- If the detector entrypoint is missing or fails to load, report deterministic scan unavailable and continue with browser/manual review.
Browser visualization is required for a viewable target when browser automation is available. Use a localhost dev/static URL for local files; avoid file:// unless the available browser explicitly supports this workflow. Overlay flow:
- Create a fresh tab and navigate.
- Preflight mutable injection by setting
document.titleand appending a<script>tag. Read-only evaluate APIs do not count. - If mutation is unavailable, skip live server, browser presentation, and injection; report fallback signal.
- If mutation is available, start
node {{scripts_path}}/live-server.mjs --background, present the browser if supported, label[Human], scroll top, injecthttp://localhost:PORT/detect.js, wait 2-3 seconds, readimpeccableconsole messages, then stop the live server. - For multi-view targets, inject on 3-5 representative pages.
Return: CLI findings JSON/counts, browser console findings if applicable, false positives, and skipped/failed browser steps with concrete reasons.
After Assessment B returns usable CLI findings, reuse them. Do not rerun detect.mjs in the parent unless Assessment B failed, was truncated, or omitted count, rule names, or file locations.
Generate Combined Critique Report
Synthesize both assessments into a single report. Do NOT simply concatenate. Weave the findings together, noting where the LLM review and detector agree, where the detector caught issues the LLM missed, and where detector findings are false positives.
The chat response is the primary user-facing deliverable. Present the full structured critique below in chat; do not replace it with a summary and a link. The persisted snapshot is only an archive/backlog for later commands.
Codex final-answer note: `$impeccable critique` produces a report artifact, so the final chat response should intentionally exceed the usual concise close-out style. Do not title the final response "Critique Summary" unless the user explicitly asked for a summary.Structure your feedback as a design director would:
Design Health Score
Consult heuristics-scoring
Present the Nielsen's 10 heuristics scores as a table:
| # | Heuristic | Score | Key Issue |
|---|---|---|---|
| 1 | Visibility of System Status | ? | [specific finding or "n/a" if solid] |
| 2 | Match System / Real World | ? | |
| 3 | User Control and Freedom | ? | |
| 4 | Consistency and Standards | ? | |
| 5 | Error Prevention | ? | |
| 6 | Recognition Rather Than Recall | ? | |
| 7 | Flexibility and Efficiency | ? | |
| 8 | Aesthetic and Minimalist Design | ? | |
| 9 | Error Recovery | ? | |
| 10 | Help and Documentation | ? | |
| Total | ??/40 | [Rating band] |
Be honest with scores. A 4 means genuinely excellent. Most real interfaces score 20-32.
Anti-Patterns Verdict
Start here. Does this look AI-generated?
LLM assessment: Your own evaluation of AI slop tells. Cover overall aesthetic feel, layout sameness, generic composition, missed opportunities for personality.
Deterministic scan: Summarize what the automated detector found, with counts and file locations. Note any additional issues the detector caught that you missed, and flag any false positives.
Visual overlays (if injection succeeded): Tell the user that overlays are now visible in the [Human] tab in their browser, highlighting the detected issues. Summarize what the console output reported. If browser visualization was attempted but injection failed, say that no reliable user-visible overlay is available and report the fallback signal instead.
Overall Impression
A brief gut reaction: what works, what doesn't, and the single biggest opportunity.
What's Working
Highlight 2-3 things done well. Be specific about why they work.
Priority Issues
The 3-5 most impactful design problems, ordered by importance.
For each issue, tag with P0-P3 severity (consult heuristics-scoring for severity definitions):
- [P?] What: Name the problem clearly
- Why it matters: How this hurts users or undermines goals
- Fix: What to do about it (be concrete)
- Suggested command: Which command could address this (from: {{available_commands}})
Persona Red Flags
Consult personas
Auto-select 2-3 personas most relevant to this interface type (use the selection table in the reference). If {{config_file}} contains a ## Design Context section from impeccable teach, also generate 1-2 project-specific personas from the audience/brand info.
For each selected persona, walk through the primary user action and list specific red flags found:
Alex (Power User): No keyboard shortcuts detected. Form requires 8 clicks for primary action. Forced modal onboarding. High abandonment risk.
Jordan (First-Timer): Icon-only nav in sidebar. Technical jargon in error messages ("404 Not Found"). No visible help. Will abandon at step 2.
Be specific. Name the exact elements and interactions that fail each persona. Don't write generic persona descriptions; write what broke for them.
Minor Observations
Quick notes on smaller issues worth addressing.
Questions to Consider
Provocative questions that might unlock better solutions:
- "What if the primary action were more prominent?"
- "Does this need to feel this complex?"
- "What would a confident version of this look like?"
Codex Run Notes are final-chat only. Do not include this section in the persisted snapshot body, because persistence, trend read, and temp cleanup happen after the snapshot write and would otherwise archive stale status such as "pending after persistence."
Remember:
- Be direct. Vague feedback wastes everyone's time.
- Be specific. "The submit button," not "some elements."
- Say what's wrong AND why it matters to users.
- Give concrete suggestions. Cut "consider exploring..." entirely.
- Prioritize ruthlessly. If everything is important, nothing is.
- Don't soften criticism. Developers need honest feedback to ship great design.
Persist the Snapshot
Once the report above is finalized, write it to .impeccable/critique/ so the user can refer back, and so {{command_prefix}}impeccable polish can pick up the priority issues without a copy-paste.
Skip this step if the Setup slug was null (vague or root-level target).
-
Write the body to a temp file so you can pipe it to the helper. Use the full critique report (heuristic table, anti-patterns verdict, priority issues, persona red flags, minor observations, and questions), but stop before the "Ask the User" / "Recommended Actions" sections that come later.
Codex: exclude Run Notes from the temp body file; Run Notes are final-chat only because persistence, trend read, and temp cleanup happen after the snapshot write. -
Pass the structured metadata through
IMPECCABLE_CRITIQUE_META(JSON), then run the write command:IMPECCABLE_CRITIQUE_META='{"target":"<user phrasing>","total_score":<n>,"p0_count":<n>,"p1_count":<n>}' \ node {{scripts_path}}/critique-storage.mjs write <slug> <body-file>The helper prints the absolute path it wrote.
-
Delete the temp body file after the write attempt completes, whether the write succeeded or failed. If deletion fails, mention
temp-file cleanup failed: <reason>briefly in the final output, but do not block the critique. -
Read the trend for context:
node {{scripts_path}}/critique-storage.mjs trend <slug> 5This returns a JSON array of the last 5 frontmatter entries (including the one you just wrote).
-
Append a single line to the user-visible output, after the report and before the questions:
Trend for
<slug>(last 5 runs): 24 → 28 → 32 → 29 → 32 Wrote.impeccable/critique/<filename>.If this is the first run for the slug, the trend is just one score; say so: "First run for this target, no trend yet."
This is fire-and-forget. Do not show the user the helper's JSON output; only the human-readable trend line and the written path. Failures here should not block the rest of the flow; print the error and move on.
Ask the User
After presenting findings, use targeted questions based on what was actually found. {{ask_instruction}} These answers will shape the action plan.
Ask questions along these lines (adapt to the specific findings; do NOT ask generic questions):
-
Priority direction: Based on the issues found, ask which category matters most to the user right now. For example: "I found problems with visual hierarchy, color usage, and information overload. Which area should we tackle first?" Offer the top 2-3 issue categories as options.
-
Design intent: If the critique found a tonal mismatch, ask whether it was intentional. For example: "The interface feels clinical and corporate. Is that the intended tone, or should it feel warmer/bolder/more playful?" Offer 2-3 tonal directions as options based on what would fix the issues found.
-
Scope: Ask how much the user wants to take on. For example: "I found N issues. Want to address everything, or focus on the top 3?" Offer scope options like "Top 3 only", "All issues", "Critical issues only".
-
Constraints (optional; only ask if relevant): If the findings touch many areas, ask if anything is off-limits. For example: "Should any sections stay as-is?" This prevents the plan from touching things the user considers done.
Rules for questions:
- Every question must reference specific findings from the report. Never ask generic "who is your audience?" questions.
- Keep it to 2-4 questions maximum. Respect the user's time.
- Offer concrete options, not open-ended prompts.
- If findings are straightforward (e.g., only 1-2 clear issues), skip questions and go directly to Recommended Actions.
Recommended Actions
After receiving the user's answers, present a prioritized action summary reflecting the user's priorities and scope from Ask the User.
Action Summary
List recommended commands in priority order, based on the user's answers:
{{command_prefix}}command-name: Brief description of what to fix (specific context from critique findings){{command_prefix}}command-name: Brief description (specific context) ...
Rules for recommendations:
- Only recommend commands from: {{available_commands}}
- Order by the user's stated priorities first, then by impact
- Each item's description should carry enough context that the command knows what to focus on
- Map each Priority Issue to the appropriate command
- Skip commands that would address zero issues
- If the user chose a limited scope, only include items within that scope
- If the user marked areas as off-limits, exclude commands that would touch those areas
- End with
{{command_prefix}}impeccable polishas the final step if any fixes were recommended
After presenting the summary, tell the user:
You can ask me to run these one at a time, all at once, or in any order you prefer.
Re-run
{{command_prefix}}impeccable critiqueafter fixes to see your score improve.