Skip to main content

/content API

The content API allows for simple navigation to a site and capturing the page's content (including the <head> section). Browserless will respond with a Content-Type of text/html, and string of the site's HTML after it has been rendered and evaluated inside the browser. This is useful for capturing the content of a page that has a lot of JavaScript or other interactivity.

You can check the full Open API schema here.

BrowserQL

We recommend using BrowserQL, Browserless' first-class browser automation API, to capture content from any website to be used in complex browser automation tasks.

Example

curl -X POST \
https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/"
}'

Browser parameters

You can also use the below parameters to precisely control the content you are fetching from the targeted site.

Rejecting Undesired Requests

You can use rejectResourceTypes and rejectRequestPattern to block undesired content, resources and requests.

curl -X POST \
https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://browserless.io/"
"rejectResourceTypes": ["image"],
"rejectRequestPattern": ["/^.*\\.(css)"]
}'

To customize the navigation behavior when loading a page, such as specifying when to consider the page fully loaded (e.g., waiting for network activity to settle), you can use the gotoOptions parameter. The objects mirror Puppeteer's GoToOptions interface.

curl -X POST \
https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"gotoOptions": { "waitUntil": "networkidle2" }
}'

Continue on error

You can use bestAttempt to make Browserless attempt to proceed when async events fail or timeout. This includes things like the goto or waitForSelector proprieties in the JSON payload.

curl -X POST \
https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"bestAttempt": true,
"waitForSelector": { "selector": "table", "timeout": 500 }
}'

Waiting for Things

Browserless offers 4 different ways to wait for preconditions to be met on page. These are events, functions, selectors and timeouts.

waitForEvent

Waits for an event to happen on the page before cotinue

Example:

curl -X POST \
https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"waitForEvent": {
"event": "fullscreenchange",
"timeout": 5000
}
}'

waitForFunction

Waits for the provided function to return before cotinue. The function can be any valid JavaScript or EcmaScript function, and async functions are supported.

Example:

JS function

async () => {
const res = await fetch('https://jsonplaceholder.typicode.com/todos/1');
const json = await res.json();

document.querySelector("h1").innerText = json.title;
}
curl -X POST \
https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"waitForFunction": {
"fn": "async()=>{let t=await fetch('https://jsonplaceholder.typicode.com/todos/1'),e=await t.json();document.querySelector('h1').innerText=e.title}",
"timeout": 5000
}
}'

waitForSelector

Wait for a selector to appear in page. If at the moment of calling the method the selector already exists, the method will return immediately. If the selector doesn't appear after the timeout milliseconds of waiting, the function will throw.

The object can have any of these values:

  • selector: String, required — A valid CSS selector.
  • hidden Boolean, optional — Wait for the selected element to not be found in the DOM or to be hidden, i.e. have display: none or visibility: hidden CSS properties.
  • timeout: Number, optional — Maximum number of milliseconds to wait for the selector before failing.
  • visible: Boolean, optional — Wait for the selected element to be present in DOM and to be visible, i.e. to not have display: none or visibility: hidden CSS properties.

Example:

curl -X POST \
https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"waitForSelector": {
"selector": "h1",
"timeout": 5000
}
}'

waitForTimeout

Waits for a specified timeout before continuing.

Example:

curl -X POST \
https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"waitForTimeout": 10000
}'

Bot Detection Troubleshooting

If you're experiencing issues with the /content API returning no data or partial data, this is typically due to bot detection mechanisms employed by the target website. Websites may use various techniques to detect and block automated browsers, which can result in empty responses or incomplete content.

Recognising failures from Anti Bot Mechanisms

  • Empty HTML response or minimal content
  • Partial page content missing key elements
  • Different content compared to what you see in a regular browser
  • Blocked requests or access denied messages

Alternative: Unblock API

When encountering bot detection issues, we recommend using the /unblock API as an alternative to the /content API. The /unblock endpoint is specifically designed to bypass bot detection mechanisms and can return HTML content directly in the response.

curl --request POST \
--url 'https://production-sfo.browserless.io/unblock?token=YOUR_API_TOKEN_HERE&proxy=residential' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://www.example.com/",
"browserWSEndpoint": false,
"cookies": false,
"content": true,
"screenshot": false
}'

The /unblock API is suitable for more adavanced bot detection bypass use-cases similar to:

  • Specialized unblocking: Designed specifically to bypass bot detection mechanisms like Datadome and passive CAPTCHAs
  • Direct content return: Returns HTML content directly in the response when content: true is set
  • Enhanced success rate: Works best when combined with residential proxies (&proxy=residential)
  • Simple integration: Provides the same content extraction functionality as the /content API

For more information about the /unblock API and its capabilities, see the /unblock API documentation.