Scrape API
Extract structured JSON from a fully rendered page using CSS selectors. Each request must include a url and an elements array that lists the selectors you want to capture.
Endpoint
- Method:
POST - Path:
/scrape - Auth:
tokenquery parameter (?token=) - Content-Type:
application/json - Response:
application/json
See the OpenAPI reference for complete details.
Quickstart
- cURL
- Javascript
- Python
curl --request POST \
--url 'https://production-sfo.browserless.io/scrape?token=YOUR_API_TOKEN_HERE' \
--header 'content-type: application/json' \
--data '{
"url": "https://browserless.io/",
"elements": [
{
"selector": "h1"
}
]
}'
const TOKEN = "YOUR_API_TOKEN_HERE";
const url = `https://production-sfo.browserless.io/scrape?token=${TOKEN}`;
const headers = {
"Cache-Control": "no-cache",
"Content-Type": "application/json"
};
const data = {
url: "https://browserless.io/",
elements: [
{ selector: "h1" }
]
};
const scrapeContent = async () => {
const response = await fetch(url, {
method: 'POST',
headers: headers,
body: JSON.stringify(data)
});
const result = await response.json();
console.log(result);
};
scrapeContent();
import requests
TOKEN = "YOUR_API_TOKEN_HERE"
url = f"https://production-sfo.browserless.io/scrape?token={TOKEN}"
headers = {
"Cache-Control": "no-cache",
"Content-Type": "application/json"
}
data = {
"url": "https://browserless.io/",
"elements": [
{ "selector": "h1" }
]
}
response = requests.post(url, headers=headers, json=data)
result = response.json()
print(result)
Response
{
"data": [
{
"results": [
{
"attributes": [
{ "name": "class", "value": "..." }
],
"height": 120,
"html": "Headless browser automation, without the hosting headaches",
"left": 32,
"text": "Headless browser automation, without the hosting headaches",
"top": 196,
"width": 736
}
],
"selector": "h1"
}
]
}
How scraping works
The API uses document.querySelectorAll under the hood. Browserless loads the page, runs client-side JavaScript, and then waits (up to 30 seconds by default) for your selectors before scraping. Use more specific selectors to narrow down results.
Bot detection troubleshooting
If scraped results are empty or missing expected elements, the site may be blocking automation. Signs include:
- Empty or missing
resultsarrays in the response - Partial page content missing key elements
- Different content compared to a regular browser
- Blocked requests or access denied messages
The /unblock API is specifically designed to bypass bot detection mechanisms like Datadome and passive CAPTCHAs.
Configuration options
The /scrape API supports shared request configuration options that apply across REST endpoints. In addition to elements and selectors, you can:
- Waiting for things: Wait for events, functions, selectors, or timeouts before scraping
- Navigation options: Customize navigation behavior with
gotoOptions - Rejecting undesired requests: Block resources with
rejectResourceTypesandrejectRequestPattern - Continue on error: Use
bestAttemptto continue when async events fail or time out
Frequently Asked Questions
How does the Browserless /scrape API work?
Send a POST request with a URL and an elements array defining CSS selectors and what to extract (text, HTML, attributes). The API returns structured JSON matching your selectors.
Can I scrape multiple elements from a single page?
Yes. The elements array accepts multiple selector definitions. Each entry can target a different element and extract different properties, all returned in a single API response.
Does the /scrape API handle JavaScript-rendered content?
Yes. The API renders the page in a real headless browser, waits for JavaScript to execute, and then applies your selectors. You can also add custom wait conditions to handle lazy-loaded content.