Skip to main content
Version: v2

How to bypass bot detection

Websites can have a different level of anti-bot mechanisms depending on the sensitivity of their data and budget. If your automation is being blocked, take action with the steps below.

The /unblock API

You can use our /unblock API, which is designed to bypass sophisticated bot detection mechanisms effectively. This API allows you to specify a target URL and returns the HTML content, a .png screenshot or an unblocked browser session to use with Playwright or Puppeteer.

A simple cURL request to the API specifying your target website will return all the data needed to scrape it after it is done bypassing the bot detection:

curl --request POST \
--url 'https://production-sfo.browserless.io/unblock?token=GOES-HERE' \
--header 'content-type: application/json' \
--data '{
"url": "https://example.com",
"browserWSEndpoint": true,
"cookies": true,
"content": false,
"screenshot": false,
"ttl": 30000
}'

You can use the content or screenshot directly, or use the endpoint to run further actions with a library:

import puppeteer from 'puppeteer';
import unblock from './utils';

const unblock = async (url, params) =>
fetch(
'https://production-sfo.browserless.io/chromium/unblock?token=GOES-HERE',
{
method: 'POST',
body: JSON.stringify({
url,
...params,
}),
},
).then((r) => {
if (r.ok) {
return r.json();
}
throw new Error(r.status);
});

// Example response from the API
const { browserWSEndpoint, cookies } = await unblock('https://example.com', {
browserWSEndpoint: true,
cookies: true,
content: false,
screenshot: false,
ttl: 30000,
});

const browser = await puppeteer.connect({ browserWSEndpoint });
// The first page is the one unblock uses to get past bot blocks:
const [page] = await browser.pages();
await page.screenshot({ path: 'screenshot.png' });
await browser.close();

Additional strategies

If none of these do the trick, get in touch with us at support@browserless.io>.

We have more trick up our sleeves we can show you, such as captcha solving and changing viewport sizes, especially using our enterprise features.

Try out the Stealth routes

info

The stealth routes below are for only for paid cloud-unit or Enterprise plans.

We have native support for things like puppeteer-stealth, but also offer our own stealth routes that encompass more stealthy behaviors. We use a route path semantic for this, and today this only supports libraries that work over the Chrome Devtools Protocol.

// Chromium:
await puppeteer.connect({
browserWSEndpoint: 'wss://production-sfo.browserless.io/chromium/stealth?token=YOUR-API-TOKEN',
});

// Chrome:
await puppeteer.connect({
browserWSEndpoint: 'wss://production-sfo.browserless.io/chrome/stealth?token=YOUR-API-TOKEN',
});

These routes incorporate many of the anti-detection mechanisms below, which you're free to try as well.

Launch args to bypass bot detection

Use the headless arg

Most bot detectors will check your user-agent, which by default explicitly claims you're running headless chrome. This is a dead giveaway. It can be changed by setting a specific user-agent but we highly recommend you use the &headless=false flag instead, which changes your user-agent to a more credible one.

import puppeteer from "puppeteer-core";

const launchArgs = JSON.stringify({ headless: false });
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://production-sfo.browserless.io/?token=GOES-HERE&launch=${launchArgs}`,
});

//...

Use the stealth arg

The stealth flag implements Puppeteer's puppeteer-extra-plugin-stealth plugin which applies various techniques to make detection of headless puppeteer harder. This flag may backfire and be easily detected by some sites, so consider avoiding it as well.

import puppeteer from "puppeteer-core";

const launchArgs = JSON.stringify({ stealth: true });
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://production-sfo.browserless.io/?token=GOES-HERE&launch=${launchArgs}`,
});

//...

Use a proxy

Finally, the hardest sites to crack down check your IP address; there are two type of bocks that can occur, those based on type of IP, and those based on frequency of requests (rate-limits).

  • Sites checking the type of IP address will detect your data-center IP addresses when using Browserless. To overcome this, using a proxy with residential IP addresses will be the best option.
  • Sites that work the first few times and then stop working, are probably rate-limiting and it's not the residential part of it that blocks us. For these cases, you don't necessarily need a residential proxy and data-center IP addresses that rotate should be enough.

Browserless offers a residential proxy API that you can easily incorporate into your scripts.

import puppeteer from "puppeteer-core";

const browserWSEndpoint =
"http://production-sfo.browserless.io/content?token=YOUR-API-KEY&proxy=residential&proxyCountry=us&proxySticky";
const browser = await puppeteer.connect({ browserWSEndpoint });

//...

Utilizing proxies remains a crucial strategy in bypassing bot detection. Depending on the site's mechanism, you might need a proxy with residential IP addresses or a data-center IP that rotates. For more information about these topics, please check our documentation on our built-in proxy and third-party proxy.