Skip to main content

Scrape API

Extract structured JSON from a fully rendered page using CSS selectors. Each request must include a url and an elements array that lists the selectors you want to capture.

Endpoint

  • Method: POST
  • Path: /scrape
  • Auth: token query parameter (?token=)
  • Content-Type: application/json
  • Response: application/json

See the OpenAPI reference for complete details.

Quickstart

curl --request POST \
--url 'https://production-sfo.browserless.io/scrape?token=YOUR_API_TOKEN_HERE' \
--header 'content-type: application/json' \
--data '{
"url": "https://browserless.io/",
"elements": [
{
"selector": "h1"
}
]
}'

Response

{
"data": [
{
"results": [
{
"attributes": [
{ "name": "class", "value": "..." }
],
"height": 120,
"html": "Headless browser automation, without the hosting headaches",
"left": 32,
"text": "Headless browser automation, without the hosting headaches",
"top": 196,
"width": 736
}
],
"selector": "h1"
}
]
}

How scraping works

The API uses document.querySelectorAll under the hood. Browserless loads the page, runs client-side JavaScript, and then waits (up to 30 seconds by default) for your selectors before scraping. Use more specific selectors to narrow down results.

Bot detection troubleshooting

If scraped results are empty or missing expected elements, the site may be blocking automation. Signs include:

  • Empty or missing results arrays in the response
  • Partial page content missing key elements
  • Different content compared to a regular browser
  • Blocked requests or access denied messages

The /unblock API is specifically designed to bypass bot detection mechanisms like Datadome and passive CAPTCHAs.

Configuration options

The /scrape API supports shared request configuration options that apply across REST endpoints. In addition to elements and selectors, you can: