Skip to main content
Version: v2

Quick Start

Welcome to the quick start! Below are some starting quick recipes for the most common and useful use cases.

Quick Recipes

Advanced Features

Bypass active and passive bot-protection

We refer to bot detection as either active or passive:

  • Passive detectors will check a visitor’s fingerprints and let you through if they think you’re human, such as on most ecommerce sites.
  • Active detectors will challenge every visitor with a captcha, such as during a sign up or form submission.

We have different solutions for these two detector types, which are /unblock and automated captcha solving.

Avoid passive detectors with /unblock

Our /unblock API can avoid most major bot detectors, especially when used with our residential proxy. It can be used in two main ways:

  • Grab the HTML or screenshot of a page
  • Create an unlocked endpoint to automate with Puppeteer or Playwright

Grab rendered HTML with a cURL request

If you simply want the HTML for scraping, you can set the content field to true in the JSON payload. Unblock will get past the detectors, render any dynamic content in our browsers, then return the HTML or a base64 encoded image.

Here's an example with proxies enabled and set to USA. Enter your API token into the query parameters, and replace example.com with your desired site.

curl --request POST \
--url 'https://production-sfo.browserless.io/unblock?token=GOES-HERE&proxy=residential&proxyCountry=us' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://example.com/",
"browserWSEndpoint": false,
"cookies": false,
"content": true,
"screenshot": false
}'

Which will result in a response containing the unblocked page's HTML:

{
"browserWSEndpoint": "wss://production-sfo.browserless.io/e/53616c7465645f5fa57aca44763bd816bb1aa1f1210ed871a908fd60235848ce6e4bcc0a8fcfe08c6d96eff8d68d556e/devtools/browser/646b292c-bd2a-4964-af18-9c1a6081c32e",
"content": "<!DOCTYPE html><html>...</html>",
"cookies": [],
"screenshot": null,
"ttl": 60000
}

You can then process this HTML with libraries such as Scrapy or Beautiful Soup.

Unblock and connect with Puppeteer or Playwright

The API can access a site and get approval from the bot detectors, then return return a WebSocket endpoint (browserWSEndpoint) for you to re-connect to for further automation.

You first must send a JSON object to our /unblock API containing the URL of the site you wish to access. If you're reconnecting to the browser, you always want to set the ttl the browserWSEndpoint and cookies.

curl --request POST \
--url 'https://production-sfo.browserless.io/unblock?token=GOES-HERE&proxy=residential&proxyCountry=us' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://example.com/",
"browserWSEndpoint": true,
"cookies": true,
"content": false,
"screenshot": false,
"ttl": 30000
}'

Which will return a JSON response like this:

{
"browserWSEndpoint": "wss://production-sfo.browserless.io/p/53616c7465645f5f0d2e4012516859fdda7cc1ae0b16c6c5ec739d5d9f19a3d3c9b49c8a814b0fd1beae934b2e8050a0/devtools/browser/102ea3e9-74d7-42c9-a856-1bf254649b9a",
"content": null,
"cookies": [
{
name: "session_id",
value: "XYZ123",
domain: "example.com",
path: "/",
secure: true,
httpOnly: true,
},
],
"screenshot": null,
"ttl": 30000
}

After receiving the response with the browserWSEndpoint and cookies, you can use Puppeteer, Playwright or another CDP library to connect to the browser and continue your scraping process:

import puppeteer from "puppeteer-core";

const TOKEN = "GOES-HERE";

const unblock = async (url) => {
const opts = {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
url: url,
browserWSEndpoint: true,
cookies: false,
content: false,
screenshot: false,
ttl: 10000,
}),
};

const response = await fetch(
`https://production-sfo.browserless.io/chromium/unblock?token=${TOKEN}`,
opts,
);

return await response.json();
};

// Reconnect
const { browserWSEndpoint, cookies } = await unblock("https://browserless.io/");

const browser = await puppeteer.connect({
browserWSEndpoint: browserWSEndpoint + `?token=${TOKEN}`,
});
const page = (await browser.pages())[0];

// Or inject cookies into the page

// await page.setCookie(...response.cookies);
// await page.goto("https://browserless.io/");
// await page.screenshot({ path: "screenshot.png" });

await page.screenshot({ path: `screenshot-${Date.now()}.png` });
await browser.close();

Solve captchas from active detectors

Browserless directly interacts with the Chrome Devtools Protocol for features such as solving captchas. These CDP-based features require a Chrome session, so let's start one and look for a captcha.

const cdp = await page.createCDPSession();
await new Promise((resolve) =>
cdp.on("Browserless.captchaFound", () => {
console.log("Found a captcha!");
return resolve();
}),
);

await waitForCaptcha(cdp);
console.log("Captcha found!");

const { solved, error } = await cdp.send("Browserless.solveCaptcha");
console.log({ solved, error });

Here's how the full script would look like in Puppeteer:

import puppeteer from "puppeteer-core";

const browserWSEndpoint =
"wss://production-sfo.browserless.io/chromium?token=GOES-HERE&timeout=300000";

try {
const browser = await puppeteer.connect({ browserWSEndpoint });

const page = await browser.newPage();
const cdp = await page.createCDPSession();

await page.goto("https://www.google.com/recaptcha/api2/demo", {
waitUntil: "networkidle0",
});

await new Promise((resolve) => cdpSession.on("Browserless.captchaFound", resolve));
console.log("Captcha found!");

const { solved, error } = await cdp.send("Browserless.solveCaptcha");
console.log({ solved, error });

// Continue...
await page.click("#recaptcha-demo-submit");
await browser.close();
} catch (e) {
console.error("There was a big error :(", e);
process.exit(1);
}

For more information, you read more about it here or refer to our API reference.

Connect using Puppeteer or Playwright

Whether you're looking to get started, or already have an established codebase, browserless aims to make the transition as easy as possible. Puppeteer and Playwright are exceedingly convenient for this use case, since you can start using Browserless by changing just one line.

Via Puppeteer.connect()

To go from running Chrome locally to using Browserless, simply use our endpoint. You can also our use proxy with a couple of query parameters.

// Connecting to Chrome locally
const browser = await puppeteer.launch();

// Connecting to Browserless and using a proxy
const browser = await puppeteer.connect({
browserWSEndpoint: 'https://production-sfo.browserless.io/?token=GOES_HERE&proxy=residential',
});

After that your code should remain exactly the same. You can use launch flags, proxies or other details to customize the behavior.

Via Playwright.BrowserType.connect()

We support all Playwright protocols, and, just like with Puppeteer, you can easily switch to Browserless. The standard connect method uses playwright's built-in browser-server to handle the connection.

To connect to Browserless using Chrome, WebKit or Firefox, just make sure that the connection string matches the browser:

// Connecting to Firefox locally
const browser = await playwright.firefox.launch();

// Connecting to Firefox via Browserless and using a proxy
const browser = await playwright.firefox.connect(`https://production-sfo.browserless.io/firefox/playwright?token=GOES_HERE&proxy=residential`);

As with Puppeteer, you can use launch flags, proxies or other details to customize the behavior. Check out the Playwright docs for instructions for each supported language.

You can read more about our integrations with Puppeteer, Playwright, Scrapy and other libraries.

Generate screenshots and PDFs

All of our HTTP APIs can be used with a JSON or cURL request, which come with various options and properties. For instance, this is how you generate a PDF.

// JSON body
// `options` are the options available via puppeteer's Page.pdf() method
// (see our Open API documentation)
{
"url": "https://example.com/",
"options": {
"displayHeaderFooter": true,
"printBackground": false,
"format": "A0"
// Queue the lack of a `path` parameter
}
}

cURL request

curl -X POST \
https://production-sfo.browserless.io/pdf?token=MY_API_TOKEN \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"options": {
"displayHeaderFooter": true,
"printBackground": false,
"format": "A0"
}
}'

Browserless offers ways to load additional stylesheets and script tags to the page as well. This gives you full control and allows you to override page elements to suit your needs.

For more information you can check out /pdf API, our /screenshot API or our Lighthouse Tests API. Or if you need to avoid bot detection, check out /unblock API.