Skip to main content
Version: v2

How to bypass bot detection

Websites can have a different level of anti-bot mechanisms depending on the sensitivity of their data and budget. If your automation is being blocked, take action with the steps below.

The BrowserQL API

You can use our query-language, BrowserQL, which is designed to bypass sophisticated bot detection mechanisms effectively. This API allows you to specify a target URL and return data you care about: the HTML content, a .png screenshot or an unblocked browser session to use with Playwright or Puppeteer.

A simple cURL request to the API specifying your target website will return all the data needed to scrape it after it is done bypassing the bot detection:

curl --request POST \
--url 'https://production-sfo.browserless.io/chrome/bql?token=YOUR-TOKEN-HERE' \
--header 'Content-Type: application/json' \
--data '{
"query": "mutation Reconnect($url: String!) { goto(url: $url, waitUntil: networkIdle) { status } reconnect(timeout: 30000) { browserWSEndpoint } }",
"variables": { "url": "https://example.com/" }
}'

You can use the content, or screenshot directly, or use the endpoint to run further actions with a library:

import puppeteer from "puppeteer-core";

const TOKEN = 'YOUR_API_TOKEN_HERE';
const url = "https://www.browserless.io/"

const unblock = async (url) => {
const opts = {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
"query": "mutation Reconnect($url: String!) { goto(url: $url, waitUntil: networkIdle) { status } reconnect(timeout: 30000) { browserWSEndpoint } }",
"variables": { url }
}),
};

const response = await fetch(
`https://production-sfo.browserless.io/chromium/bql?token=${TOKEN}`,
opts,
);

return await response.json();
};

// Reconnect
const { data } = await unblock(url);
const browser = await puppeteer.connect({
browserWSEndpoint: data.reconnect.browserWSEndpoint + `?token=${TOKEN}`,
});
const pages = await browser.pages();

const page = pages.find((p) => p.url() === url);
await page.screenshot({ path: `screenshot-${Date.now()}.png` });
await browser.close();

Additional strategies

If none of these do the trick, get in touch with us at support@browserless.io>.

We have more trick up our sleeves we can show you, such as captcha solving and changing viewport sizes, especially using our enterprise features.

Try out the Stealth routes

info

The stealth routes below are for only for paid cloud-unit or Enterprise plans.

We have native support for things like puppeteer-stealth, but also offer our own stealth routes that encompass more stealthy behaviors. We use a route path semantic for this, and today this only supports libraries that work over the Chrome Devtools Protocol.

// Chromium:
await puppeteer.connect({
browserWSEndpoint:
"wss://production-sfo.browserless.io/chromium/stealth?token=YOUR_API_TOKEN_HERE",
});

// Chrome:
await puppeteer.connect({
browserWSEndpoint:
"wss://production-sfo.browserless.io/chrome/stealth?token=YOUR_API_TOKEN_HERE",
});

These routes incorporate many of the anti-detection mechanisms below, which you're free to try as well.

Launch args to bypass bot detection

Use the headless arg

Most bot detectors will check your user-agent, which by default explicitly claims you're running headless chrome. This is a dead giveaway. It can be changed by setting a specific user-agent but we highly recommend you use the &headless=false flag instead, which changes your user-agent to a more credible one.

import puppeteer from "puppeteer-core";

const launchArgs = JSON.stringify({ headless: false });
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://production-sfo.browserless.io/?token=YOUR_API_TOKEN_HERE&launch=${launchArgs}`,
});

//...

Use the stealth arg

The stealth flag implements Puppeteer's puppeteer-extra-plugin-stealth plugin which applies various techniques to make detection of headless puppeteer harder. This flag may backfire and be easily detected by some sites, so consider avoiding it as well.

import puppeteer from "puppeteer-core";

const launchArgs = JSON.stringify({ stealth: true });
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://production-sfo.browserless.io/?token=YOUR_API_TOKEN_HERE&launch=${launchArgs}`,
});

//...

Use a proxy

Finally, the hardest sites to crack down check your IP address; there are two type of bocks that can occur, those based on type of IP, and those based on frequency of requests (rate-limits).

  • Sites checking the type of IP address will detect your data-center IP addresses when using Browserless. To overcome this, using a proxy with residential IP addresses will be the best option.
  • Sites that work the first few times and then stop working, are probably rate-limiting and it's not the residential part of it that blocks us. For these cases, you don't necessarily need a residential proxy and data-center IP addresses that rotate should be enough.

Browserless offers a residential proxy API that you can easily incorporate into your scripts.

import puppeteer from "puppeteer-core";

const browserWSEndpoint = "http://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE&proxy=residential&proxyCountry=us&proxySticky";
const browser = await puppeteer.connect({ browserWSEndpoint });

//...

Utilizing proxies remains a crucial strategy in bypassing bot detection. Depending on the site's mechanism, you might need a proxy with residential IP addresses or a data-center IP that rotates. For more information about these topics, please check our documentation on our built-in proxy and third-party proxy.