TOML Config
Jobs in SpyWeb are configured via TOML. Use a single jobs.toml in the root directory or a modular jobs/my-job/config.toml structure.
SpyWeb validates config statically before startup or reload.
Job Config
Section titled “Job Config”| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name |
string | Yes | - | Job name |
url |
string | Yes | - | Target URL |
selector |
string | Yes | - | CSS selector for item containers |
fields |
array | Yes | - | Fields to extract |
urls |
string[] | No | - | Multiple entry URLs (overrides url) |
Behavior
Section titled “Behavior”| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled |
bool | No | true |
Enable/disable job |
interval |
integer | No | 600 |
Run interval (seconds) |
workers |
integer | No | 1 |
Per-job worker concurrency |
Filtering & Dedup
Section titled “Filtering & Dedup”| Field | Type | Required | Default | Description |
|---|---|---|---|---|
keywords |
string[] | No | - | Filter by keywords |
search_fields |
string[] | No | - | Limit keyword search to specific fields |
hash_fields |
string[] | No | all | Fields for dedup hash |
Network & Output
Section titled “Network & Output”| Field | Type | Required | Default | Description |
|---|---|---|---|---|
headers |
table | No | - | Custom HTTP headers |
proxy |
table | No | - | Proxy configuration |
webhook |
table | No | - | Webhook configuration |
notification |
table | No | enabled |
Desktop notification settings |
debug |
bool | No | false |
Save raw HTML + extracted JSON |
Validation Rules
Section titled “Validation Rules”name,url,selector, andfieldsare required and cannot be blankfieldsmust contain at least one entryintervalmust be > 0workersmust be > 0 if set- All URLs must be absolute (including
webhook.urlandproxy.urls) urlsmust contain at least one valid URL if set- Field names must be unique within a job
search_fieldsandhash_fieldsmust reference existing extracted field names- Job names and normalized IDs must be unique across all jobs
Field Syntax
Section titled “Field Syntax”# Shorthand - "name:selector" (defaults to text content)fields = ["title:h2", "link:a@href"]
# Full form - explicit selector and attributefields = [ { name = "title", selector = "h2", att = "text" }, { name = "link", selector = "a", att = "href" },]Proxy Config
Section titled “Proxy Config”[proxy]enabled = truerotate = "RoundRobin" # or "Sticky", "Random"urls = [ "socks5://proxy1:1080", "http://proxy2:8080",]Deduplication (hash_fields)
Section titled “Deduplication (hash_fields)”By default SpyWeb hashes all extracted fields to determine if an item is new. This catches every update (price changes, modified descriptions).
For volatile data that changes independently (e.g. date_posted, view_count), use hash_fields to specify only the fields that uniquely identify the item:
hash_fields = ["link"]If all specified hash_fields are empty for an item, SpyWeb falls back to hashing all fields.
Webhook Config
Section titled “Webhook Config”[webhook]enabled = trueurl = "https://your-webhook.example.com/endpoint"headers = { "Authorization" = "Bearer your-token" }To reshape the webhook payload (e.g. for Discord embeds or Slack blocks), use before_webhook in your hooks. See Webhook Payloads for details.
Notification Templates
Section titled “Notification Templates”[notification]enabled = truetimeout = 5000title = "Found {item_count} new items from {job_name}"body = """Title: {title}Link: {link}Keywords: {matches}"""Available tags: {job_name}, {url}, {item_count}, {timestamp}, {matches}, {match_count}, and any extracted field name.
Most desktop OS restrict notification bodies to about 4 lines. Templates with multiple fields per item will show roughly one record per pop-up.