Skip to main content

How to scrape websites

For web scraping, Browseless recommends using BrowserQL, our most advanced option. BQL is our own custom library that avoids the fingerprints left by other automation tools.

In this section you'll go through the following:

How BrowserQL works

BrowserQL is our own browser automation language, built on GraphQL. We’ve kept it simple, as a set of queries and responses.

BQL is optimized for web automation and scraping, designed to minimize complexity by making intelligent assumptions. Here’s what it does for you:

  • Loads a browser with human-like fingerprints.
  • Manages page-load events, like firstContentfulPaint.
  • Handles proxies and request filtering.
  • Waits for any mentioned selectors.

Instead of worrying about these technical details, you can focus on queries and actions. You will:

  1. Navigate to pages.
  2. Perform actions (e.g., click, type).
  3. Extract data (e.g., text, HTML).

Each query follows the format of function (arguments) {responses} with an optional name: beforehand For example, going to a site and grabbing a product name would be:

mutation scrape_example {
goto(url: "https://example.com", waitUntil: networkIdle) {
status
}
productName: text(
selector: "span#productTitle"
visible: true
) {
text
}
}

We then run this query with the browsers hosted in our cloud, which you can see run in the editor.