Quick Start
Welcome to the quick start! Below are some starting quick recipes for the most common and useful use cases.
Quick Recipes
- Using our BrowserQL
- Bypass bot detectors
- Connect with Puppeteer or Playwright
- Generate screenshots and PDFs
Advanced Features
Using BrowserQL
BrowserQL is our easy-to-use query-language that comes bundled with its own editor. In order to get started with the editing experience, sign-up for any account and you can download the editor from the Account portal.
To get started you'll need two key pieces of information: the URL for the area you're nearest to and your API key. We have the following locations:
https://production-sfo.browserless.io/
(Based in San Francisco, USA).https://production-lon.browserless.io/
(Based in London, England).https://production-ams.browserless.io/
(Based in Amsterdam, Netherlands).
Picking a close location is key to the fastest experience! Once you have these, then you can get started with a simple query like this:
mutation ScrapeHN {
goto(url: "https://news.ycombinator.com", waitUntil: firstMeaningfulPaint) {
status
time
}
text {
text
}
}
If you want to see more BrowserQL options and recipes, check out our section on BrowserQL.
Bypass active and passive bot-protection
We refer to bot detection as either active or passive:
- Passive detectors will check a visitor’s fingerprints and let you through if they think you’re human, such as on most ecommerce sites.
- Active detectors will challenge every visitor with a captcha, such as during a sign up or form submission.
We have different solutions for these two detector types, which BrowserQL and automated captcha solving.
Avoid passive detectors with BrowserQL
Our BrowserQL can avoid many bot detectors, especially when used with our residential proxy. It can be used in many ways:
- Grab the HTML, text or screenshot of a page or specific elements
- Create an unlocked endpoint to automate further with Puppeteer or Playwright
Grab rendered HTML with a cURL request
If you simply want the HTML for scraping, you can simply query for the HTML inside the query itself. BrowserQL will get past the detectors, render any dynamic content in our browsers, then return the HTML.
Here's an example with proxies enabled and set to USA. Enter your API token into the query parameters, and replace example.com with your desired site.
curl --request POST \
--url 'https://production-sfo.browserless.io/chromium/bql?token=YOUR-API-KEY&proxy=residential&proxySticky=true&proxyCountry=us' \
--header 'Content-Type: application/json' \
--data '{
"query": "mutation GetContent { goto(url: \"https://example.com\", waitUntil: firstMeaningfulPaint) { status time } html { html } }"
}'
Which will result in a response containing the unblocked page's HTML:
{
"data": {
"goto": {
"status": 200,
"time": 957
},
"html": {
"html": "<!DOCTYPE html>...</html>"
}
}
}
You can then process this HTML with libraries such as Scrapy or Beautiful Soup.
Unblock and connect with Puppeteer or Playwright
The API can access a site and get approval from the bot detectors, then return return a WebSocket endpoint (browserWSEndpoint
) for you to re-connect to for further automation.
First, have your query visit the site (or even fill in form elements if you wish). Then ask for the reconnection endpoints back to connect your library. Here, we use a timeout of 30,000 milliseconds, but you're free to increase or decrease this depending on your use-case. The full query by itself will look like:
mutation Reconnect($url: String!) {
goto(url: $url, waitUntil: networkIdle) {
status
}
reconnect(timeout: 30000) {
browserWSEndpoint
}
}
Here's the full example as a cURL call. Be sure to insert your token and the website you wish to visit as well:
curl --request POST \
--url 'https://production-sfo.browserless.io/chrome/bql?token=YOUR-TOKEN-HERE' \
--header 'Content-Type: application/json' \
--data '{
"query": "mutation Reconnect($url: String!) { goto(url: $url, waitUntil: networkIdle) { status } reconnect(timeout: 30000) { browserWSEndpoint } }",
"variables": { "url": "https://example.com/" }
}'
Which will return a JSON response like this:
{
"data": {
"goto": {
"status": 200
},
"reconnect": {
"browserWSEndpoint": "wss://production-sfo.browserless.io/e/53616c7465645f5f91..."
}
}
}
After receiving the response with the browserWSEndpoint
and cookies
, you can use Puppeteer, Playwright or another CDP library to connect to the browser and continue your scraping process:
import puppeteer from "puppeteer-core";
const TOKEN = "GOES-HERE";
const url = "https://browserless.io/"
const unblock = async (url) => {
const opts = {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
"query": "mutation Reconnect($url: String!) { goto(url: $url, waitUntil: networkIdle) { status } reconnect(timeout: 30000) { browserWSEndpoint } }",
"variables": { url }
}),
};
const response = await fetch(
`https://production-sfo.browserless.io/chromium/bql?token=${TOKEN}`,
opts,
);
return await response.json();
};
// Reconnect
const { data } = await unblock(url);
const browser = await puppeteer.connect({
browserWSEndpoint: data.reconnect.browserWSEndpoint + `?token=${TOKEN}`,
});
const pages = await browser.pages();
const page = pages.find((p) => p.url.includes(url));
await page.screenshot({ path: `screenshot-${Date.now()}.png` });
await browser.close();
Solve captchas from active detectors
Browserless directly interacts with the Chrome Devtools Protocol for features such as solving captchas. These CDP-based features require a Chrome session, so let's start one and look for a captcha.
const cdp = await page.createCDPSession();
await new Promise((resolve) =>
cdp.on("Browserless.captchaFound", () => {
console.log("Found a captcha!");
return resolve();
}),
);
await waitForCaptcha(cdp);
console.log("Captcha found!");
const { solved, error } = await cdp.send("Browserless.solveCaptcha");
console.log({ solved, error });
Here's how the full script would look like in Puppeteer:
import puppeteer from "puppeteer-core";
const browserWSEndpoint =
"wss://production-sfo.browserless.io/chromium?token=GOES-HERE&timeout=300000";
try {
const browser = await puppeteer.connect({ browserWSEndpoint });
const page = await browser.newPage();
const cdp = await page.createCDPSession();
await page.goto("https://www.google.com/recaptcha/api2/demo", {
waitUntil: "networkidle0",
});
await new Promise((resolve) => cdpSession.on("Browserless.captchaFound", resolve));
console.log("Captcha found!");
const { solved, error } = await cdp.send("Browserless.solveCaptcha");
console.log({ solved, error });
// Continue...
await page.click("#recaptcha-demo-submit");
await browser.close();
} catch (e) {
console.error("There was a big error :(", e);
process.exit(1);
}
For more information, you read more about it here or refer to our API reference.
Connect using Puppeteer or Playwright
Whether you're looking to get started, or already have an established codebase, browserless aims to make the transition as easy as possible. Puppeteer and Playwright are exceedingly convenient for this use case, since you can start using Browserless by changing just one line.
Via Puppeteer.connect()
To go from running Chrome locally to using Browserless, simply use our endpoint. You can also our use proxy with a couple of query parameters.
// Connecting to Chrome locally
const browser = await puppeteer.launch();
// Connecting to Browserless and using a proxy
const browser = await puppeteer.connect({
browserWSEndpoint: 'https://production-sfo.browserless.io/?token=GOES_HERE&proxy=residential',
});
After that your code should remain exactly the same. You can use launch flags, proxies or other details to customize the behavior.
Via Playwright.BrowserType.connect()
We support all Playwright protocols, and, just like with Puppeteer, you can easily switch to Browserless. The standard connect
method uses playwright's built-in browser-server to handle the connection.
To connect to Browserless using Chrome, WebKit or Firefox, just make sure that the connection string matches the browser:
// Connecting to Firefox locally
const browser = await playwright.firefox.launch();
// Connecting to Firefox via Browserless and using a proxy
const browser = await playwright.firefox.connect(`https://production-sfo.browserless.io/firefox/playwright?token=GOES_HERE&proxy=residential`);
As with Puppeteer, you can use launch flags, proxies or other details to customize the behavior. Check out the Playwright docs for instructions for each supported language.
You can read more about our integrations with Puppeteer, Playwright, Scrapy and other libraries.
Generate screenshots and PDFs
All of our HTTP APIs can be used with a JSON or cURL request, which come with various options and properties. For instance, this is how you generate a PDF.
// JSON body
// `options` are the options available via puppeteer's Page.pdf() method
// (see our Open API documentation)
{
"url": "https://example.com/",
"options": {
"displayHeaderFooter": true,
"printBackground": false,
"format": "A0"
// Queue the lack of a `path` parameter
}
}
cURL request
curl -X POST \
https://production-sfo.browserless.io/pdf?token=MY_API_TOKEN \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"options": {
"displayHeaderFooter": true,
"printBackground": false,
"format": "A0"
}
}'
Browserless offers ways to load additional stylesheets and script tags to the page as well. This gives you full control and allows you to override page elements to suit your needs.
For more information you can check out /pdf
API, our /screenshot
API or our Lighthouse Tests
API. Or if you need to avoid bot detection, check out BrowserQL.