How to use the Unblock API
Looking for full developer docs? See them here.
When using a library like puppeteer or playwright, there's a lot of sites out there that can detect this library usage. These libraries leave traces of their existence behind in many ways (web workers, background pages, etc). With the /unblock API, Browserless takes a minimalistic approach into getting past bot blocks. It uses a variety of tools and strategies that we've seen work well in the past, and includes many adapters to get past more notorious mechanisms:
- We don't use any library for this API, and work directly with the browser's native remote interfaces.
- We launch and run the browser just like end-users.
- Detection of common bot blockers are listened for and corrected.
- Returns the underlying
browserWSEndpoint
to connect to, or just content and cookies.
This JSON API is flexible in how it operates. By setting all certain values to false
, Browserless will optimize the execution of this request to only return fields you asked for, and not spend time producing the other fields.
Here's a list of common examples:
Returning cookies only
{
"url": "https://www.example.com/",
"cookies": true,
"content": false,
"browserWSEndpoint": false,
"screenshot": false,
"ttl": 0
}
Returning content only
{
"url": "https://www.example.com/",
"cookies": false,
"content": true,
"browserWSEndpoint": false,
"screenshot": false,
"ttl": 0
}
Returning screenshots only
{
"url": "https://www.example.com/",
"cookies": false,
"content": false,
"browserWSEndpoint": false,
"screenshot": true,
"ttl": 0
}
Full example with a library
This API is also intended to work with libraries that run on Chrome's Devtools Protocol (like puppeteer and certain APIs in playwright). What this means is you can unblock a site, then connect your library of choice to the browser and kick off your automation. It's a powerful way to work and keeps your code more focused on your needs, rather than trying to solve management-related tasks.
import puppeteer from 'puppeteer-core';
// The underlying site you wish automate
const url = 'https://www.example.com/';
// Your API token retrievable from the dashboard.
const token = 'YOUR-API-KEY';
// Set a threshold, in milliseconds, to unblock
const timeout = 5 * 60 * 1000;
// What proxy type (residential), or remove for none.
const proxy = 'residential';
// Where you want to proxy from (GB === Great Britain), or remove for none.
const proxyCountry = 'gb';
// If you want to use the same proxy IP for every network request
const proxySticky = true;
const queryParams = new URLSearchParams({
timeout,
proxy,
proxyCountry,
proxySticky,
token,
}).toString();
const unblockURL =
`https://production-sfo.browserless.io/chromium/unblock?${queryParams}`;
const options = {
method: 'POST',
headers: {
'content-type': 'application/json'
},
body: JSON.stringify({
url: url,
browserWSEndpoint: true,
cookies: true,
content: true,
screenshot: true,
ttl: 5000,
}),
};
try {
console.log(`Unblocking ${url}`);
const response = await fetch(unblockURL, options);
if (!response.ok) {
throw new Error(`Got non-ok response:\n` + (await response.text()));
}
const { browserWSEndpoint } = await response.json();
console.log(`Got OK response! Connecting puppeteer to "${browserWSEndpoint}"...`);
const browser = await puppeteer.connect({
browserWSEndpoint: `${browserWSEndpoint}?${queryParams}`
});
const [page] = await browser.pages();
page.on('response', (res) => {
if (!res.ok) {
console.log(`${res.status()}: ${res.url()}`);
}
});
console.log('Reloading page with networkidle0...');
await page.reload({
waitUntil: 'networkidle0',
timeout,
});
console.log('Taking page screenshot...');
await page.screenshot({
path: 'temp.png',
fullPage: true,
});
console.log('Done!');
await browser.close();
} catch (error) {
console.error(error);
}
With this powerful API you can get past a lot of common bot detection mechanisms and worry less about your automation.