Skip to main content

Scrape API

Extract structured JSON from a fully rendered page using CSS selectors. Each request must include a url and an elements array that lists the selectors you want to capture.

Endpoint

  • Method: POST
  • Path: /scrape
  • Auth: token query parameter (?token=)
  • Content-Type: application/json
  • Response: application/json

See the OpenAPI reference for complete details.

Quickstart

curl --request POST \
--url 'https://production-sfo.browserless.io/scrape?token=YOUR_API_TOKEN_HERE' \
--header 'content-type: application/json' \
--data '{
"url": "https://browserless.io/",
"elements": [
{
"selector": "h1"
}
]
}'

Response

{
"data": [
{
"results": [
{
"attributes": [
{ "name": "class", "value": "..." }
],
"height": 120,
"html": "Headless browser automation, without the hosting headaches",
"left": 32,
"text": "Headless browser automation, without the hosting headaches",
"top": 196,
"width": 736
}
],
"selector": "h1"
}
]
}

How scraping works

BrowserQL

We recommend using BrowserQL, Browserless' first-class browser automation API, to scrape content from any website.

The API uses document.querySelectorAll under the hood. Browserless loads the page, runs client-side JavaScript, and then waits (up to 30 seconds by default) for your selectors before scraping. Use more specific selectors to narrow down results.

Specifying Page-Load Behavior

The scrape API allows for setting specific page-load behaviors by setting a gotoOptions in the JSON body. This is passed directly into puppeteer's goto() method.

In the example below, we'll set a waitUntil property and a timeout.

curl --request POST \
--url 'https://production-sfo.browserless.io/scrape?token=YOUR_API_TOKEN_HERE' \
--header 'content-type: application/json' \
--data '{
"url": "https://example.com/",
"elements": [
{
"selector": "h1"
}
],
"gotoOptions": {
"timeout": 10000,
"waitUntil": "networkidle2"
}
}'

Custom behavior with waitFor options

Sometimes it's helpful to do further actions, or wait for custom events on the page before getting data. We allow this behavior with the waitFor properties.

waitForTimeout

Use waitForTimeout to pause for a fixed number of milliseconds before scraping.

curl --request POST \
--url 'https://production-sfo.browserless.io/scrape?token=YOUR_API_TOKEN_HERE' \
--header 'content-type: application/json' \
--data '{
"url": "https://example.com/",
"elements": [
{
"selector": "h1"
}
],
"waitForTimeout": 1000
}'

waitForSelector

Use waitForSelector to wait for an element to appear before scraping. If the selector already exists, the method returns immediately. If the selector doesn't appear within the timeout, the request throws an exception.

Example

curl --request POST \
--url 'https://production-sfo.browserless.io/scrape?token=YOUR_API_TOKEN_HERE' \
--header 'content-type: application/json' \
--data '{
"url": "https://example.com/",
"elements": [
{
"selector": "h1"
}
],
"waitForSelector": {
"selector": "h1",
"timeout": 5000
}
}'

waitForFunction

Use waitForFunction to run custom JavaScript on the page and wait until it finishes before scraping. The function can be any valid JS function, including async functions.

Example

JS function

async () => {
const res = await fetch('https://jsonplaceholder.typicode.com/todos/1');
const json = await res.json();

document.querySelector("h1").innerText = json.title;
}
curl --request POST \
--url 'https://production-sfo.browserless.io/scrape?token=YOUR_API_TOKEN_HERE' \
--header 'content-type: application/json' \
--data '{
"url": "https://example.com/",
"elements": [
{
"selector": "h1"
}
],
"waitForFunction": {
"fn": "async()=>{let t=await fetch('\''https://jsonplaceholder.typicode.com/todos/1'\''),e=await t.json();document.querySelector('\''h1'\'').innerText=e.title}",
"timeout": 5000
}
}'

waitForEvent

Use waitForEvent to wait for a custom event that your application dispatches before scraping. This is useful for Single Page Applications (SPAs) that signal when they're ready.

Example

curl --request POST \
--url 'https://production-sfo.browserless.io/scrape?token=YOUR_API_TOKEN_HERE' \
--header 'content-type: application/json' \
--data '{
"url": "https://example.com",
"elements": [
{ "selector": "a" }
],
"addScriptTag": [{
"content": "setTimeout(() => document.dispatchEvent(new CustomEvent('\''app:ready'\'', { detail: { status: '\''loaded'\'' } })), 250);"
}],
"waitForEvent": {
"event": "app:ready",
"timeout": 1000
}
}'
warning

waitForEvent only works with custom events, not lifecycle events like load or DOMContentLoaded. Use gotoOptions.waitUntil for lifecycle events.

Configuration options

The /scrape API supports shared request configuration options that apply across REST endpoints. In addition to elements and selectors, you can:

  • Control navigation with gotoOptions (for example waitUntil and timeout)
  • Wait for conditions using waitForTimeout, waitForSelector, waitForFunction, and waitForEvent
  • Reduce noise with rejectResourceTypes and rejectRequestPattern
  • Continue on error with bestAttempt when async steps fail or time out