discord-bot: real-Chrome fallback for bot-blocked web research
Add scripts/web-fetch-chrome.py — drives the installed Chrome 148 headlessly
via Playwright (channel="chrome", no Chromium download), runs JS, strips the
HeadlessChrome UA tell, isolated profile so it never touches a human's open
Chrome. Wire it into DISCORD_CLAUDE.md ("Web Research / Bot-Blocked Sites":
WebFetch first, real-Chrome fallback) and refine the headless rule to permit
headless fetching while still forbidding visible/interactive browser windows.
Add playwright to requirements.txt (no `playwright install` needed). Restarted bot.
Tested: static + JS-rendered pages render; UA reports Chrome/148 (not Headless).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -33,7 +33,7 @@ You run as a background Windows service. There is no human at the BEAST console.
|
||||
that opens a window and waits for someone to click or type into it will hang forever.
|
||||
|
||||
NEVER attempt:
|
||||
- Launching Chrome or any browser (including for OAuth or interactive sign-in)
|
||||
- Launching a VISIBLE / interactive browser window, or any browser-based OAuth / interactive sign-in flow (no one is at the console to complete it). NOTE: headless Chrome for web research IS allowed — see "Web Research / Bot-Blocked Sites" below.
|
||||
- Opening a Windows credential prompt, UAC dialog, or any GUI authentication window
|
||||
- 1Password / SOPS GUI unlock, or any desktop app that needs interactive input
|
||||
- Any command that blocks on a console prompt no one can answer
|
||||
@@ -47,6 +47,32 @@ Instead:
|
||||
|
||||
---
|
||||
|
||||
## Web Research / Bot-Blocked Sites
|
||||
|
||||
When you need to look something up (vendor pricing, repair/parts estimates, spec sheets, etc.):
|
||||
|
||||
1. Try `WebFetch` / `WebSearch` first — fastest, no browser.
|
||||
2. If the site is bot-blocked — HTTP 403/429, a CAPTCHA / "verify you are human" wall, a "please
|
||||
enable JavaScript" stub, or an empty/garbage body — fall back to real Chrome.
|
||||
|
||||
**Real-Chrome fetch** — headless, drives the installed Chrome via Playwright (`channel="chrome"`),
|
||||
runs JavaScript, presents a normal Chrome user-agent, and uses an isolated profile so it never
|
||||
touches a human's open Chrome session on BEAST. Run it with the bot venv's Python:
|
||||
|
||||
```bash
|
||||
projects/discord-bot/.venv/Scripts/python.exe projects/discord-bot/scripts/web-fetch-chrome.py "<url>"
|
||||
```
|
||||
|
||||
Useful flags: `--selector "<css>"` (extract just one element, e.g. a price), `--html` (raw markup
|
||||
instead of readable text), `--max-chars N` (default 8000; `0` = no limit), `--wait-until networkidle`
|
||||
(for slow / heavily-scripted pages). Page content prints to stdout; errors (timeout, blocked, DNS)
|
||||
go to stderr with a non-zero exit code.
|
||||
|
||||
This headless fetch is the ONLY sanctioned browser use — do NOT open a visible Chrome window or
|
||||
drive the human's interactive session.
|
||||
|
||||
---
|
||||
|
||||
## Task Loop
|
||||
|
||||
For every request, work this loop:
|
||||
|
||||
@@ -11,3 +11,8 @@ pydantic-settings>=2.5.2
|
||||
aiofiles>=23.2.1
|
||||
python-dotenv>=1.0.0
|
||||
structlog>=24.1.0
|
||||
|
||||
# Browser automation for bot-blocked web research (scripts/web-fetch-chrome.py).
|
||||
# Drives the system-installed Chrome via channel="chrome" — no `playwright install`
|
||||
# (no bundled Chromium download) needed.
|
||||
playwright>=1.60.0
|
||||
|
||||
Reference in New Issue
Block a user