Scrape Structured Data
Extract structured data from fully rendered JavaScript pages using CSS selectors, BrowserQL, or your preferred framework.
- A Browserless API token from your account dashboard
Steps
- REST API
- Frameworks
- BQL
Use the /scrape REST endpoint to extract structured data from a page. No WebSocket connection needed.
- cURL
- JavaScript
- Python
- Java
- C#
1. Build the request
Append your token to the scrape endpoint and specify the selectors you want to extract:
https://production-sfo.browserless.io/scrape?token=YOUR_API_TOKEN_HERE
2. Send the request
curl -X POST \
"https://production-sfo.browserless.io/scrape?token=YOUR_API_TOKEN_HERE" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"elements": [
{ "selector": "h1" },
{ "selector": "p" }
]
}'
3. Check the output
The response is JSON with a data array. Each item corresponds to one selector and includes matched elements with their text, HTML, dimensions, and position:
{
"data": [
{
"selector": "h1",
"results": [
{
"attributes": [],
"height": 38,
"html": "Example Domain",
"left": 28,
"text": "Example Domain",
"top": 160,
"width": 716
}
]
},
{
"selector": "p",
"results": [
{
"attributes": [],
"height": 48,
"html": "This domain is for use in illustrative examples...",
"left": 28,
"text": "This domain is for use in illustrative examples...",
"top": 220,
"width": 716
}
]
}
]
}
1. Send the request
const response = await fetch(
'https://production-sfo.browserless.io/scrape?token=YOUR_API_TOKEN_HERE',
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
url: 'https://example.com',
elements: [
{ selector: 'h1' },
{ selector: 'p' },
],
}),
}
);
const { data } = await response.json();
console.log(data);
2. Check the output
Run the script with node scrape.mjs. The extracted data is logged to the console as a structured JSON array.
1. Install dependencies
pip install requests
2. Send the request
import requests
response = requests.post(
'https://production-sfo.browserless.io/scrape?token=YOUR_API_TOKEN_HERE',
json={
'url': 'https://example.com',
'elements': [
{'selector': 'h1'},
{'selector': 'p'},
],
},
)
data = response.json()['data']
print(data)
3. Check the output
Run the script with python scrape.py. The extracted data is printed as a list of selector results.
1. Add dependencies
<!-- https://kong.github.io/unirest-java/ -->
<dependency>
<groupId>com.konghq</groupId>
<artifactId>unirest-java</artifactId>
<version>3.14.5</version>
</dependency>
2. Send the request
import kong.unirest.HttpResponse;
import kong.unirest.Unirest;
String url = "https://production-sfo.browserless.io/scrape";
String token = "YOUR_API_TOKEN_HERE";
String endpoint = String.format("%s?token=%s", url, token);
HttpResponse<String> response = Unirest.post(endpoint)
.header("Content-Type", "application/json")
.body("{\"url\": \"https://example.com\", \"elements\": [{\"selector\": \"h1\"}, {\"selector\": \"p\"}]}")
.asString();
System.out.println(response.getBody());
3. Check the output
Run the class. The response body is structured JSON with matched elements for each selector.
1. Send the request
using System.Net.Http;
using System.Text;
using System.Text.Json;
string url = "https://production-sfo.browserless.io/scrape";
string token = "YOUR_API_TOKEN_HERE";
string endpoint = $"{url}?token={token}";
var payload = new
{
url = "https://example.com",
elements = new[] { new { selector = "h1" }, new { selector = "p" } },
};
using (HttpClient httpClient = new HttpClient())
{
var jsonPayload = JsonSerializer.Serialize(payload);
var content = new StringContent(jsonPayload, Encoding.UTF8, "application/json");
var response = await httpClient.PostAsync(endpoint, content);
string responseBody = await response.Content.ReadAsStringAsync();
Console.WriteLine(responseBody);
}
2. Check the output
Run the program. The response body is structured JSON with matched elements for each selector.
Use a browser connection to evaluate the fully rendered DOM and extract data with custom logic.
- Puppeteer
- Playwright
1. Install dependencies
npm install puppeteer-core
2. Connect and extract
import puppeteer from 'puppeteer-core';
const browser = await puppeteer.connect({
browserWSEndpoint: 'wss://production-sfo.browserless.io?token=YOUR_API_TOKEN_HERE',
});
try {
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle2' });
const data = await page.evaluate(() => ({
heading: document.querySelector('h1')?.textContent,
paragraphs: [...document.querySelectorAll('p')].map((el) => el.textContent),
}));
console.log(data);
} finally {
// Always close to release the session even on error.
await browser.close();
}
3. Check the output
Run the script with node scrape.mjs. The extracted data is logged to the console.
1. Install dependencies
npm install playwright-core
2. Connect and extract
import { chromium } from 'playwright-core';
const browser = await chromium.connect(
'wss://production-sfo.browserless.io/chromium/playwright?token=YOUR_API_TOKEN_HERE'
);
try {
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle' });
const data = await page.evaluate(() => ({
heading: document.querySelector('h1')?.textContent,
paragraphs: [...document.querySelectorAll('p')].map((el) => el.textContent),
}));
console.log(data);
} finally {
// Always close to release the session even on error.
await browser.close();
}
3. Check the output
Run the script with node scrape.mjs. The extracted data is logged to the console.
1. Write the mutation
Navigate to the page and map elements to structured text using CSS selectors:
mutation Scrape {
goto(url: "https://example.com", waitUntil: domContentLoaded) {
status
}
heading: mapSelector(selector: "h1") {
innerText
}
paragraphs: mapSelector(selector: "p") {
innerText
}
}
2. Run it
Paste into the BQL IDE and click Run.
3. Check the output
The response returns structured JSON with text for each matched element:
{
"data": {
"goto": { "status": 200 },
"heading": [{ "innerText": "Example Domain" }],
"paragraphs": [
{ "innerText": "This domain is for use in illustrative examples..." }
]
}
}
Next steps
- Take a Screenshot — capture the page visually
- Fill and Submit a Form — automate form interactions before scraping
- Authenticated Sessions — scrape pages that require login