How it Works
Browserless V2 is available in production via two domains: production-sfo.browserless.io
and production-lon.browserless.io
Browserless works almost identically to how most libraries and web-drivers work when ran locally. There's no additional software you need to install on your production machines or complicated setup steps. As a matter of fact, the only thing you really need to when using the browserless service is merely change where your code references the browser.
Browserless runs browsers in a cloud environment, and exposes most of the Chrome DevTools protocol and the Playwright Protocols to you. On top of exposing these commands, it also:
- Isolates your session from all others.
- Can run concurrent requests without interfering with others.
- Cleans up sessions after 30 seconds.
- Starts a clean copy of a browser for each session.
- Restarts automatically if anything crashes.
- Queues requests if thresholds are met.
- Helps bypass bot detectors.
How sessions work
Via Puppeteer.connect()
Libraries like puppeteer and chrome-remote-interface can hook into an existing Chrome instance by websocket. The hosted browserless service only supports this type of interface since you can pass in tokens and other query-params. Typically you only need to replace how you start Chrome with a connect-like statement:
// Connecting to Chrome locally
const browser = await puppeteer.launch();
// Connecting to browserless
const browser = await puppeteer.connect({
browserWSEndpoint: 'https://production-sfo.browserless.io/?token=GOES_HERE',
});
After that your code should remain exactly the same.
Via Playwright.BrowserType.connect()
We support all Playwright protocols, and, just like with Puppeteer, you can easily switch to Browserless. The standard connect
method uses playwright's built-in browser-server to handle the connection. This, generally, is a faster and more fully-featured method since it supports most of the playwright parameters (such as using a proxy and more).
To connect to Browserless using Chrome, WebKit or Firefox, just make sure that the connection string matches the browser:
// Connecting to Firefox locally
const browser = await playwright.firefox.launch();
// Connecting to Firefox via Browserless
const browser = await playwright.firefox.connect(`https://production-sfo.browserless.io/firefox/playwright?token=GOES_HERE`);
Via host
and port
(Chrome DevTools Protocol)
Many libraries for the Chrome DevTools Protocol will issue an HTTP request to one of the /json
endpoints exposed by the protocol. When this request happens, Browserless will respond with the resulting payload to allow remote programs to interact with it.
If you're looking to use the Browserless service with non-Node language, it's better to use the REST API's and /function
endpoint as Browserless can run puppeteer code for you. Take a look at our blog post about this interface here.
Introspection Request
# curl https://production-sfo.browserless.io/json/list?token=YOUR-API-KEY
[
{
"description":"",
"devtoolsFrontendUrl":"/devtools/inspector.html?ws=138.197.93.72:3000/devtools/page/da78a5e7-1db5-4d47-a2a5-07885088ad07",
"id":"da78a5e7-1db5-4d47-a2a5-07885088ad07",
"title":"about:blank",
"type":"page",
"url":"about:blank",
"webSocketDebuggerUrl":"ws://138.197.93.72:3000/devtools/page/da78a5e7-1db5-4d47-a2a5-07885088ad07"
}
]
The websocket endpoints are where commands from the protocol are emitted into, and Chrome will then emit responses back. Browserless does not modify or alter any of these messages. Once your session and underlying websocket are closed, Browserless will automatically clear that Target and session data.