/unblock API
Currently, Browserless V2 is available in production via two domains: production-sfo.browserless.io
and production-lon.browserless.io
The /unblock
API is designed to bypass bot detection mechanisms such as Cloudflare, Datadome and other passive CAPTCHAs. There are two main ways to use the API
- Grab the HTML or screenshot of a page with
content
orscreenshot
set to true. - Generate a WebSocket endpoint to perform automations with Playwright, Puppeteer or another CDP library.
Using the /unblock API is charged at 10 units per page. It works best when combined with our residential proxy.
This API is particularly useful for developers who need to automate web interactions on sites that employ sophisticated bot detection and blocking techniques. It offers four different ways to wait for preconditions to be met before returning a response.
You can check the full Open API schema here for all properties and documentation.
Retrieving HTML
If you'd like to retrieve the HTML of a page for scraping, you can set the content
field to true
in the JSON payload or cURL request. With a proxy enabled, this would be:
curl --request POST \
--url 'https://production-sfo.browserless.io/unblock?token=GOES-HERE&proxy=residential' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://example.com/",
"browserWSEndpoint": false,
"cookies": false,
"content": true,
"screenshot": false
}'
Which will result in a response containing the unblocked page's HTML:
{
"browserWSEndpoint": "wss://production-sfo.browserless.io/e/53616c7465645f5fa57aca44763bd816bb1aa1f1210ed871a908fd60235848ce6e4bcc0a8fcfe08c6d96eff8d68d556e/devtools/browser/646b292c-bd2a-4964-af18-9c1a6081c32e",
"content": "<!DOCTYPE html><html>...</html>",
"cookies": [],
"screenshot": null,
"ttl": 60000
}
You can then process this HTML with libraries such as Scrapy or Beautiful Soup.
Creating an endpoint
The /unblock API can get past a bot detector, then give you the cookies and a connection to the browser instance to use in your automations.
JSON Payload
This is a JSON object containing the URL of the site you wish to unblock, along with the parameters you want in the response. If you're reconnecting to the browser, you always want to set the ttl
the browserWSEndpoint
and cookies
.
{
"url": "https://example.com",
"browserWSEndpoint": true,
"cookies": true,
"content": false,
"screenshot": false,
"ttl": 30000
}
cURL Request
We would recommend using /unblock with a residential proxy, such as in this example.
curl --request POST \
--url 'https://production-sfo.browserless.io/unblock?token=GOES-HERE&proxy=residential' \
--header 'content-type: application/json' \
--data '{
"url": "https://example.com",
"browserWSEndpoint": true,
"cookies": true,
"content": false,
"screenshot": false,
"ttl": 30000
}'
Which will return a JSON response like this:
{
"browserWSEndpoint": "wss://production-sfo.browserless.io/p/53616c7465645f5f0d2e4012516859fdda7cc1ae0b16c6c5ec739d5d9f19a3d3c9b49c8a814b0fd1beae934b2e8050a0/devtools/browser/102ea3e9-74d7-42c9-a856-1bf254649b9a",
"content": null,
"cookies": [
{
name: "session_id",
value: "XYZ123",
domain: "example.com",
path: "/",
secure: true,
httpOnly: true,
},
],
"screenshot": null,
"ttl": 30000
}
After receiving the response with the browserWSEndpoint
and cookies
, you can use Puppeteer, Playwright or another CDP library to connect to the browser instance and inject the cookies to continue your scraping process:
Puppeteer Connection
import puppeteer from "puppeteer-core";
const TOKEN = "GOES-HERE";
const unblock = async (url) => {
const opts = {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
url: url,
browserWSEndpoint: true,
cookies: false,
content: false,
screenshot: false,
ttl: 10000,
}),
};
const response = await fetch(
`https://production-sfo.browserless.io/chromium/unblock?token=${TOKEN}`,
opts,
);
return await response.json();
};
// Reconnect
const { browserWSEndpoint, cookies } = await unblock("https://browserless.io/");
const browser = await puppeteer.connect({
browserWSEndpoint: browserWSEndpoint + `?token=${TOKEN}`,
});
const page = (await browser.pages())[0];
// Or inject cookies into the page
// await page.setCookie(...response.cookies);
// await page.goto("https://browserless.io/");
// await page.screenshot({ path: "screenshot.png" });
await page.screenshot({ path: `screenshot-${Date.now()}.png` });
await browser.close();
Waiting for Things
Browserless offers 4 different ways to wait for preconditions to be met on the page before returning the response. These are events
, functions
, selectors
and timeouts
.
waitForEvent
Waits for an event to happen on the page before continuing:
Example
JSON payload
// Will fail since the event never fires on this page,
// but used for demonstration purposes
{
"url": "https://example.com/",
"waitForEvent": {
"event": "fullscreenchange",
"timeout": 5000
}
}
cURL request
curl -X POST \
https://production-sfo.browserless.io/unblock?token=MY_API_TOKEN \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"waitForEvent": {
"event": "fullscreenchange",
"timeout": 5000
}
}'
waitForFunction
Waits for the provided function to return before continuing. The function can be any valid JavaScript function including async
functions.
Example
JS function
async () => {
const res = await fetch("https://jsonplaceholder.typicode.com/todos/1");
const json = await res.json();
document.querySelector("h1").innerText = json.title;
};
JSON payload
{
"url": "https://example.com/",
"waitForFunction": {
"fn": "async()=>{let t=await fetch('https://jsonplaceholder.typicode.com/todos/1'),e=await t.json();document.querySelector('h1').innerText=e.title}",
"timeout": 5000
}
}
cURL request
curl -X POST \
https://production-sfo.browserless.io/unblock?token=MY_API_TOKEN \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"waitForFunction": {
"fn": "async()=>{let t=await fetch('https://jsonplaceholder.typicode.com/todos/1'),e=await t.json();document.querySelector('h1').innerText=e.title}",
"timeout": 5000
}
}'
waitForSelector
Waits for a selector to appear on the page. If at the moment of calling this API, the selector already exists, the method will return immediately. If the selector doesn't appear after the timeout milliseconds of waiting the API will return a non-200 response code with an error message as the body of the response.
The object can have any of these values:
selector
: String, required — A valid CSS selector.hidden
Boolean, optional — Wait for the selected element to not be found in the DOM or to be hidden, i.e. havedisplay: none
orvisibility: hidden
CSS properties.timeout
: Number, optional — Maximum number of milliseconds to wait for the selector before failing.visible
: Boolean, optional — Wait for the selected element to be present in DOM and to be visible, i.e. to not havedisplay: none
orvisibility: hidden
CSS properties.
Example
JSON payload
{
"url": "https://example.com/",
"waitForSelector": {
"selector": "h1",
"timeout": 5000
}
}
cURL request
curl -X POST \
https://production-sfo.browserless.io/unblock?token=MY_API_TOKEN \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"waitForSelector": {
"selector": "h1",
"timeout": 5000
}
}'