Scrape API
Extract structured JSON from a fully rendered page using CSS selectors. Each request must include a url and an elements array that lists the selectors you want to capture.
Endpoint
- Method:
POST - Path:
/scrape - Auth:
tokenquery parameter (?token=) - Content-Type:
application/json - Response:
application/json
See the OpenAPI reference for complete details.
Quickstart
- cURL
- Javascript
- Python
curl --request POST \
--url 'https://production-sfo.browserless.io/scrape?token=YOUR_API_TOKEN_HERE' \
--header 'content-type: application/json' \
--data '{
"url": "https://browserless.io/",
"elements": [
{
"selector": "h1"
}
]
}'
const TOKEN = "YOUR_API_TOKEN_HERE";
const url = `https://production-sfo.browserless.io/scrape?token=${TOKEN}`;
const headers = {
"Cache-Control": "no-cache",
"Content-Type": "application/json"
};
const data = {
url: "https://browserless.io/",
elements: [
{ selector: "h1" }
]
};
const scrapeContent = async () => {
const response = await fetch(url, {
method: 'POST',
headers: headers,
body: JSON.stringify(data)
});
const result = await response.json();
console.log(result);
};
scrapeContent();
import requests
TOKEN = "YOUR_API_TOKEN_HERE"
url = f"https://production-sfo.browserless.io/scrape?token={TOKEN}"
headers = {
"Cache-Control": "no-cache",
"Content-Type": "application/json"
}
data = {
"url": "https://browserless.io/",
"elements": [
{ "selector": "h1" }
]
}
response = requests.post(url, headers=headers, json=data)
result = response.json()
print(result)
Response
{
"data": [
{
"results": [
{
"attributes": [
{ "name": "class", "value": "..." }
],
"height": 120,
"html": "Headless browser automation, without the hosting headaches",
"left": 32,
"text": "Headless browser automation, without the hosting headaches",
"top": 196,
"width": 736
}
],
"selector": "h1"
}
]
}
How scraping works
The API uses document.querySelectorAll under the hood. Browserless loads the page, runs client-side JavaScript, and then waits (up to 30 seconds by default) for your selectors before scraping. Use more specific selectors to narrow down results.
Bot detection troubleshooting
If scraped results are empty or missing expected elements, the site may be blocking automation. Signs include:
- Empty or missing
resultsarrays in the response - Partial page content missing key elements
- Different content compared to a regular browser
- Blocked requests or access denied messages
The /unblock API is specifically designed to bypass bot detection mechanisms like Datadome and passive CAPTCHAs.
Configuration options
The /scrape API supports shared request configuration options that apply across REST endpoints. In addition to elements and selectors, you can:
- Waiting for things: Wait for events, functions, selectors, or timeouts before scraping
- Navigation options: Customize navigation behavior with
gotoOptions - Rejecting undesired requests: Block resources with
rejectResourceTypesandrejectRequestPattern - Continue on error: Use
bestAttemptto continue when async events fail or time out