Skip to main content
Version: v2

/unblock API

BrowserQL

We recommended using BrowserQL, Browserless' first-class browser automation API, to bypass any bot detection mechanisms.

The /unblock API is designed to bypass bot detection mechanisms such as Cloudflare, Datadome and other passive CAPTCHAs. There are two main ways to use the API

  • Grab the HTML or screenshot of a page with content or screenshot set to true.
  • Generate a WebSocket endpoint to perform automations with Playwright, Puppeteer or another CDP library.

Using the /unblock API is charged at 10 units per page. It works best when combined with our residential proxy.

This API is particularly useful for developers who need to automate web interactions on sites that employ sophisticated bot detection and blocking techniques. It offers four different ways to wait for preconditions to be met before returning a response.

You can check the full Open API schema here for all properties and documentation.

Retrieving HTML

If you'd like to retrieve the HTML of a page for scraping, you can set the content field to true in the JSON payload. With a proxy enabled, this would be:

curl --request POST \
--url 'https://production-sfo.browserless.io/unblock?token=YOUR_API_TOKEN_HERE&proxy=residential' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://example.com/",
"browserWSEndpoint": false,
"cookies": false,
"content": true,
"screenshot": false
}'

Which will result in a response containing the unblocked page's HTML:

{
"browserWSEndpoint": "wss://production-sfo.browserless.io/e/53616c7465645f5fa57aca44763bd816bb1aa1f1210ed871a908fd60235848ce6e4bcc0a8fcfe08c6d96eff8d68d556e/devtools/browser/646b292c-bd2a-4964-af18-9c1a6081c32e",
"content": "<!DOCTYPE html><html>...</html>",
"cookies": [],
"screenshot": null,
"ttl": 60000
}

You can then process this HTML with libraries such as Scrapy or Beautiful Soup.

Creating an endpoint

The /unblock API can get past a bot detector, then give you the cookies and a connection to the browser instance to use in your automations.

JSON Payload

This is a JSON object containing the URL of the site you wish to unblock, along with the parameters you want in the response. If you're reconnecting to the browser, you always want to set the ttl the browserWSEndpoint and cookies.

{
"url": "https://example.com",
"browserWSEndpoint": true,
"cookies": true,
"content": false,
"screenshot": false,
"ttl": 30000
}

We would recommend using /unblock with a residential proxy, such as in this example.

curl --request POST \
--url 'https://production-sfo.browserless.io/unblock?token=YOUR_API_TOKEN_HERE&proxy=residential' \
--header 'content-type: application/json' \
--data '{
"url": "https://example.com",
"browserWSEndpoint": true,
"cookies": true,
"content": false,
"screenshot": false,
"ttl": 30000
}'

Which will return a JSON response like this:

{
"browserWSEndpoint": "wss://production-sfo.browserless.io/p/53616c7465645f5f0d2e4012516859fdda7cc1ae0b16c6c5ec739d5d9f19a3d3c9b49c8a814b0fd1beae934b2e8050a0/devtools/browser/102ea3e9-74d7-42c9-a856-1bf254649b9a",
"content": null,
"cookies": [
{
name: "session_id",
value: "XYZ123",
domain: "example.com",
path: "/",
secure: true,
httpOnly: true,
},
],
"screenshot": null,
"ttl": 30000
}

After receiving the response with the browserWSEndpoint and cookies, you can use Puppeteer, Playwright or another CDP library to connect to the browser instance and inject the cookies to continue your scraping process:

import puppeteer from "puppeteer-core";

const TOKEN = "YOUR_API_TOKEN_HERE";

const unblock = async (url) => {
const opts = {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
url: url,
browserWSEndpoint: true,
cookies: false,
content: false,
screenshot: false,
ttl: 10000,
}),
};

const response = await fetch(
`https://production-sfo.browserless.io/chromium/unblock?token=${TOKEN}`,
opts,
);

return await response.json();
};

// Reconnect
const { browserWSEndpoint, cookies } = await unblock("https://browserless.io/");

const browser = await puppeteer.connect({
browserWSEndpoint: browserWSEndpoint + `?token=${TOKEN}`,
});
const page = (await browser.pages())[0];

// Or inject cookies into the page
// await page.setCookie(...response.cookies);
// await page.goto("https://browserless.io/");
// await page.screenshot({ path: "screenshot.png" });

await page.screenshot({ path: `screenshot-${Date.now()}.png` });
await browser.close();

Waiting for Things

Browserless offers 4 different ways to wait for preconditions to be met on the page before returning the response. These are events, functions, selectors and timeouts.

waitForEvent

Waits for an event to happen on the page before continuing:

Example

curl -X POST \
https://production-sfo.browserless.io/unblock?token=YOUR_API_TOKEN_HERE \
-H 'Cache-Control: no-cache' \
-H 'Content-Type': 'application/json' \
-d '{
"url": "https://example.com/",
"waitForEvent": {
"event": "fullscreenchange",
"timeout": 5000
}
}'

waitForFunction

Waits for the provided function to return before continuing. The function can be any valid JavaScript function including async functions.

Example

JS function

async () => {
const res = await fetch("https://jsonplaceholder.typicode.com/todos/1");
const json = await res.json();

document.querySelector("h1").innerText = json.title;
};
curl -X POST \
https://production-sfo.browserless.io/unblock?token=YOUR_API_TOKEN_HERE \
-H 'Cache-Control: no-cache' \
-H 'Content-Type': 'application/json' \
-d '{
"url": "https://example.com/",
"waitForFunction": {
"fn": "async()=>{let t=await fetch('https://jsonplaceholder.typicode.com/todos/1'),e=await t.json();document.querySelector('h1').innerText=e.title}",
"timeout": 5000
}
}'

waitForSelector

Waits for a selector to appear on the page. If at the moment of calling this API, the selector already exists, the method will return immediately. If the selector doesn't appear after the timeout milliseconds of waiting the API will return a non-200 response code with an error message as the body of the response.

The object can have any of these values:

  • selector: String, required — A valid CSS selector.
  • hidden Boolean, optional — Wait for the selected element to not be found in the DOM or to be hidden, i.e. have display: none or visibility: hidden CSS properties.
  • timeout: Number, optional — Maximum number of milliseconds to wait for the selector before failing.
  • visible: Boolean, optional — Wait for the selected element to be present in DOM and to be visible, i.e. to not have display: none or visibility: hidden CSS properties.

Example

curl -X POST \
https://production-sfo.browserless.io/unblock?token=YOUR_API_TOKEN_HERE \
-H 'Cache-Control: no-cache' \
-H 'Content-Type': 'application/json' \
-d '{
"url": "https://example.com/",
"waitForSelector": {
"selector": "h1",
"timeout": 5000
}
}'