diff --git a/projects/discord-bot/DISCORD_CLAUDE.md b/projects/discord-bot/DISCORD_CLAUDE.md index bd9d96a..d6895a8 100644 --- a/projects/discord-bot/DISCORD_CLAUDE.md +++ b/projects/discord-bot/DISCORD_CLAUDE.md @@ -33,7 +33,7 @@ You run as a background Windows service. There is no human at the BEAST console. that opens a window and waits for someone to click or type into it will hang forever. NEVER attempt: -- Launching Chrome or any browser (including for OAuth or interactive sign-in) +- Launching a VISIBLE / interactive browser window, or any browser-based OAuth / interactive sign-in flow (no one is at the console to complete it). NOTE: headless Chrome for web research IS allowed — see "Web Research / Bot-Blocked Sites" below. - Opening a Windows credential prompt, UAC dialog, or any GUI authentication window - 1Password / SOPS GUI unlock, or any desktop app that needs interactive input - Any command that blocks on a console prompt no one can answer @@ -47,6 +47,32 @@ Instead: --- +## Web Research / Bot-Blocked Sites + +When you need to look something up (vendor pricing, repair/parts estimates, spec sheets, etc.): + +1. Try `WebFetch` / `WebSearch` first — fastest, no browser. +2. If the site is bot-blocked — HTTP 403/429, a CAPTCHA / "verify you are human" wall, a "please + enable JavaScript" stub, or an empty/garbage body — fall back to real Chrome. + +**Real-Chrome fetch** — headless, drives the installed Chrome via Playwright (`channel="chrome"`), +runs JavaScript, presents a normal Chrome user-agent, and uses an isolated profile so it never +touches a human's open Chrome session on BEAST. Run it with the bot venv's Python: + +```bash +projects/discord-bot/.venv/Scripts/python.exe projects/discord-bot/scripts/web-fetch-chrome.py "" +``` + +Useful flags: `--selector ""` (extract just one element, e.g. a price), `--html` (raw markup +instead of readable text), `--max-chars N` (default 8000; `0` = no limit), `--wait-until networkidle` +(for slow / heavily-scripted pages). Page content prints to stdout; errors (timeout, blocked, DNS) +go to stderr with a non-zero exit code. + +This headless fetch is the ONLY sanctioned browser use — do NOT open a visible Chrome window or +drive the human's interactive session. + +--- + ## Task Loop For every request, work this loop: diff --git a/projects/discord-bot/requirements.txt b/projects/discord-bot/requirements.txt index df43f52..e08ccd9 100644 --- a/projects/discord-bot/requirements.txt +++ b/projects/discord-bot/requirements.txt @@ -11,3 +11,8 @@ pydantic-settings>=2.5.2 aiofiles>=23.2.1 python-dotenv>=1.0.0 structlog>=24.1.0 + +# Browser automation for bot-blocked web research (scripts/web-fetch-chrome.py). +# Drives the system-installed Chrome via channel="chrome" — no `playwright install` +# (no bundled Chromium download) needed. +playwright>=1.60.0