Skip to content

Overview

Every scrape run flows through a series of pipeline hooks you can define in your Lua script. Each hook receives a request, response, or items table plus the shared ctx, and can modify data, skip stages, or inject side effects.

  • Return the value (modified or not) to continue the pipeline
  • Return nil or false to drop/skip (behavior varies per hook)
  • Hook errors are logged and skipped, they never crash the job
  • Only define the hooks you actually need. SpyWeb pre-detects which functions exist at startup and skips the processing logic entirely for any that are missing.

Every hook receives a per-cycle ctx table as its second argument. See Context for the full reference.

The hooks execute in this exact order during each scrape cycle:

before_fetch(request, ctx)executor thread

Modify URL, add headers, or return nil to skip this run entirely.

override_fetch(request, ctx)executor thread

Custom fetching phase (bypasses built-in HTTP client if defined).

[HTTP fetch]background thread

Automatic - uses request config from previous stage.

after_fetch(fetch_result, ctx)executor thread

Inspect success or failure, mutate body on success, or return a synthetic response to recover from fetch errors.

override_extract(response, ctx)executor thread

Custom extraction phase for JSON/XML (bypasses built-in CSS extraction if defined).

[CSS extraction]background thread

Automatic - raw parser with DOM fallback.

after_extract(items, ctx)executor thread

Batch filter/modify all items. nil or empty = no items.

filter_item(item, ctx)executor thread

Per-item filter. Replaces built-in keyword filter if defined.

before_store(items, ctx)executor thread

Last chance before DB insert. nil = skip store + notify.

[dedup + insert]background thread

Automatic - atomic check-and-insert.

before_notify(items, ctx)executor thread

Reshape or silence notifications. Items already stored.

before_webhook(payload, ctx)executor thread

Reshape or silence webhook POSTs. Full JSON payload.

[notify + webhook]background thread

Automatic - desktop notification + webhook POST.