Skip to main content
Version: v2

How to use the Unblock API

note

Looking for full developer docs? See them here.

When using a library like puppeteer or playwright, there's a lot of sites out there that can detect this library usage. These libraries leave traces of their existence behind in many ways (web workers, background pages, etc). With the /unblock API, Browserless takes a minimalistic approach into getting past bot blocks. It uses a variety of tools and strategies that we've seen work well in the past, and includes many adapters to get past more notorious mechanisms:

  • We don't use any library for this API, and work directly with the browser's native remote interfaces.
  • We launch and run the browser just like end-users.
  • Detection of common bot blockers are listened for and corrected.
  • Returns the underlying browserWSEndpoint to connect to, or just content and cookies.

This JSON API is flexible in how it operates. By setting all certain values to false, Browserless will optimize the execution of this request to only return fields you asked for, and not spend time producing the other fields.

Here's a list of common examples:

Returning cookies only

{
"url": "https://www.example.com/",
"cookies": true,
"content": false,
"browserWSEndpoint": false,
"screenshot": false,
"ttl": 0
}

Returning content only

{
"url": "https://www.example.com/",
"cookies": false,
"content": true,
"browserWSEndpoint": false,
"screenshot": false,
"ttl": 0
}

Returning screenshots only

{
"url": "https://www.example.com/",
"cookies": false,
"content": false,
"browserWSEndpoint": false,
"screenshot": true,
"ttl": 0
}

Full example with a library

This API is also intended to work with libraries that run on Chrome's Devtools Protocol (like puppeteer and certain APIs in playwright). What this means is you can unblock a site, then connect your library of choice to the browser and kick off your automation. It's a powerful way to work and keeps your code more focused on your needs, rather than trying to solve management-related tasks.

import puppeteer from 'puppeteer-core';

// The underlying site you wish automate
const url = 'https://www.example.com/';

// Your API token retrievable from the dashboard.
const token = 'YOUR-API-KEY';

// Set a threshold, in milliseconds, to unblock
const timeout = 5 * 60 * 1000;

// What proxy type (residential), or remove for none.
const proxy = 'residential';

// Where you want to proxy from (GB === Great Britain), or remove for none.
const proxyCountry = 'gb';

// If you want to use the same proxy IP for every network request
const proxySticky = true;

const queryParams = new URLSearchParams({
timeout,
proxy,
proxyCountry,
proxySticky,
token,
}).toString();

const unblockURL =
`https://production-sfo.browserless.io/chromium/unblock?${queryParams}`;

const options = {
method: 'POST',
headers: {
'content-type': 'application/json'
},
body: JSON.stringify({
url: url,
browserWSEndpoint: true,
cookies: true,
content: true,
screenshot: true,
ttl: 5000,
}),
};

try {
console.log(`Unblocking ${url}`);

const response = await fetch(unblockURL, options);

if (!response.ok) {
throw new Error(`Got non-ok response:\n` + (await response.text()));
}

const { browserWSEndpoint } = await response.json();

console.log(`Got OK response! Connecting puppeteer to "${browserWSEndpoint}"...`);
const browser = await puppeteer.connect({
browserWSEndpoint: `${browserWSEndpoint}?${queryParams}`
});
// Find the page by inspecting the URL and matching it
const page = pages.find((p) => p.url().includes(url));
page.on('response', (res) => {
if (!res.ok) {
console.log(`${res.status()}: ${res.url()}`);
}
});
console.log('Reloading page with networkidle0...');
await page.reload({
waitUntil: 'networkidle0',
timeout,
});
console.log('Taking page screenshot...');
await page.screenshot({
path: 'temp.png',
fullPage: true,
});
console.log('Done!');
await browser.close();
} catch (error) {
console.error(error);
}

With this powerful API you can get past a lot of common bot detection mechanisms and worry less about your automation.