Human in the Loop
For sites with aggressive bot detection (Cloudflare, CAPTCHAs), implement a hybrid recovery flow that switches from headless to visible browser when a block is detected.
The Strategy
Section titled “The Strategy”- Normal mode - scrape headlessly for speed
- Detection - check for a block selector (e.g.
#captcha-container) inoverride_fetch - Transition - close the headless browser and launch a visible Chrome instance
- Notification - use
notify()to alert a human operator - Intervention - poll for a “success” selector that appears after the human solves the puzzle
- Handback - capture HTML, close the visual browser, return to headless mode
Example
Section titled “Example”See Human in the Loop for a complete implementation.
-- Simplified flowfunction override_fetch(request, ctx) local state = store_get("recovery_state") or "NORMAL"
-- Just recovered - capture data if state == "SOLVED" then local page = visual_browser:attach({ reuse = true }) local html = page:content() visual_browser:close() store_set("recovery_state", "NORMAL") return { status = 200, body = html, url = request.url } end
-- Normal headless run local page = browser:attach() defer(function() page:close() end) page:open(request.url)
-- Check for data local found = page:wait_for_selector(".data", 8000) if found then return { status = 200, body = page:content(), url = request.url } end
-- Check for bot block local blocked = page:evaluate("document.querySelector('#captcha') !== null") if blocked then -- Switch to visual browser browser:close() visual_browser = cdp.launch({ headless = false }) visual_browser:attach():open(request.url) store_set("recovery_state", "RECOVERING") notify("Blocked", "Please solve the CAPTCHA") return nil end
return { error = "Data not found" }end