discord-bot: real-Chrome fallback for bot-blocked web research

Add scripts/web-fetch-chrome.py — drives the installed Chrome 148 headlessly
via Playwright (channel="chrome", no Chromium download), runs JS, strips the
HeadlessChrome UA tell, isolated profile so it never touches a human's open
Chrome. Wire it into DISCORD_CLAUDE.md ("Web Research / Bot-Blocked Sites":
WebFetch first, real-Chrome fallback) and refine the headless rule to permit
headless fetching while still forbidding visible/interactive browser windows.
Add playwright to requirements.txt (no `playwright install` needed). Restarted bot.

Tested: static + JS-rendered pages render; UA reports Chrome/148 (not Headless).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-22 13:14:55 -07:00
parent 51d55566bf
commit ee865426c7
2 changed files with 32 additions and 1 deletions

View File

@@ -33,7 +33,7 @@ You run as a background Windows service. There is no human at the BEAST console.
that opens a window and waits for someone to click or type into it will hang forever.
NEVER attempt:
- Launching Chrome or any browser (including for OAuth or interactive sign-in)
- Launching a VISIBLE / interactive browser window, or any browser-based OAuth / interactive sign-in flow (no one is at the console to complete it). NOTE: headless Chrome for web research IS allowed — see "Web Research / Bot-Blocked Sites" below.
- Opening a Windows credential prompt, UAC dialog, or any GUI authentication window
- 1Password / SOPS GUI unlock, or any desktop app that needs interactive input
- Any command that blocks on a console prompt no one can answer
@@ -47,6 +47,32 @@ Instead:
---
## Web Research / Bot-Blocked Sites
When you need to look something up (vendor pricing, repair/parts estimates, spec sheets, etc.):
1. Try `WebFetch` / `WebSearch` first — fastest, no browser.
2. If the site is bot-blocked — HTTP 403/429, a CAPTCHA / "verify you are human" wall, a "please
enable JavaScript" stub, or an empty/garbage body — fall back to real Chrome.
**Real-Chrome fetch** — headless, drives the installed Chrome via Playwright (`channel="chrome"`),
runs JavaScript, presents a normal Chrome user-agent, and uses an isolated profile so it never
touches a human's open Chrome session on BEAST. Run it with the bot venv's Python:
```bash
projects/discord-bot/.venv/Scripts/python.exe projects/discord-bot/scripts/web-fetch-chrome.py "<url>"
```
Useful flags: `--selector "<css>"` (extract just one element, e.g. a price), `--html` (raw markup
instead of readable text), `--max-chars N` (default 8000; `0` = no limit), `--wait-until networkidle`
(for slow / heavily-scripted pages). Page content prints to stdout; errors (timeout, blocked, DNS)
go to stderr with a non-zero exit code.
This headless fetch is the ONLY sanctioned browser use — do NOT open a visible Chrome window or
drive the human's interactive session.
---
## Task Loop
For every request, work this loop:

View File

@@ -11,3 +11,8 @@ pydantic-settings>=2.5.2
aiofiles>=23.2.1
python-dotenv>=1.0.0
structlog>=24.1.0
# Browser automation for bot-blocked web research (scripts/web-fetch-chrome.py).
# Drives the system-installed Chrome via channel="chrome" — no `playwright install`
# (no bundled Chromium download) needed.
playwright>=1.60.0