Skip to main content

/export API

The export API allows you to retrieve the content of any URL in its native format (HTML, PDF, images, etc.). The response format is determined by the content type of the page being accessed, with appropriate headers set to facilitate downloading or viewing the content.

You can check the full Open API schema here.

Basic Usage

The export API accepts a JSON payload with the target URL and configuration options.

JSON Payload Format

{
"url": "https://example.com/",
"headers": {
"User-Agent": "Custom User Agent"
},
"gotoOptions": {
"waitUntil": "networkidle0",
"timeout": 30000
},
"waitForSelector": {
"selector": "#main-content",
"timeout": 5000
},
"waitForTimeout": 1000,
"bestAttempt": false
}

Parameters

Required Parameters

  • url (string) - The URL of the resource to export

Optional Parameters

  • headers (object) - Custom HTTP headers to send with the request
  • gotoOptions (object) - Navigation options
    • waitUntil (string) - When to consider navigation succeeded. Options: 'load', 'domcontentloaded', 'networkidle', 'commit'. Default: 'networkidle0'
    • timeout (number) - Maximum navigation time in milliseconds
    • referer (string) - Referer header value
  • waitForEvent (object) - Wait for a specific event before proceeding
  • waitForFunction (object) - Wait for a specific function to return true
  • waitForSelector (object) - Wait for a specific selector to be present
    • selector (string) - CSS selector to wait for
    • timeout (number) - Maximum time to wait in milliseconds
  • waitForTimeout (number) - Time in milliseconds to wait after page load
  • bestAttempt (boolean) - Whether to continue on errors. Default: false
  • includeResources (boolean) - Whether to include all linked resources (images, CSS, JavaScript) in a zip file. Default: false

Response

The API returns a streaming response with the content of the requested URL. The behavior depends on the content type detected and the includeResources parameter:

  • When includeResources is false (default):

    • HTML Content: Returns the HTML with Content-Type: text/html. No attachment header is set, allowing the content to be rendered in the browser.
    • PDF Content: Returns a PDF buffer with Content-Type: application/pdf and sets a Content-Disposition: attachment header with an appropriate filename.
    • Images and Other Binary Content: Returns the binary content with the appropriate MIME type (e.g., image/jpeg, image/png) and sets a Content-Disposition: attachment header with an appropriate filename.
  • When includeResources is true:

    • Returns a zip file containing the HTML and all linked resources (images, CSS, JavaScript) with Content-Type: application/zip and Content-Disposition: attachment header with an appropriate filename.

The streaming nature of the response means you should handle it accordingly in your code, using appropriate methods for reading streams rather than assuming all content can be processed as text.

Handling Different Content Types

The export API can return various content types depending on the URL being accessed. Here's how to properly handle the different response types:

HTML Content

When accessing a standard web page, the API returns HTML content with Content-Type: text/html:

const response = await fetch(url, options);
if (response.headers.get('content-type')?.includes('text/html')) {
const htmlContent = await response.text();
// Process HTML content
}

PDF Content

When accessing PDF files or when the server returns PDF content, the API returns a PDF buffer with Content-Type: application/pdf:

const response = await fetch(url, options);
if (response.headers.get('content-type')?.includes('application/pdf')) {
const arrayBuffer = await response.arrayBuffer();
const pdfBuffer = Buffer.from(arrayBuffer);
// Save or process PDF buffer
}

Binary Content (Images, etc.)

For other binary content like images, the API returns the appropriate content type and sets attachment headers:

const response = await fetch(url, options);
const contentType = response.headers.get('content-type');
if (contentType?.includes('image/') || !contentType?.includes('text/')) {
const arrayBuffer = await response.arrayBuffer();
const binaryBuffer = Buffer.from(arrayBuffer);
// Save or process binary buffer
}

Error Handling

The API may return the following error responses:

  • 400 Bad Request - Invalid parameters, missing URL, or no content received
  • 404 Not Found - Page not found
  • 408 Request Timeout - Page load timeout
  • 500 Internal Server Error - Server-side error

Examples

Basic Export Request

This example demonstrates how to export a web page using the most basic configuration. It shows how to properly handle the streamed response by detecting the content type and saving the content with the appropriate file extension.

curl -X POST \
https://production-sfo.browserless.io/export?token=YOUR_API_TOKEN_HERE \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/"
}'

Export with Custom Navigation Options

This example demonstrates how to export a web page with custom navigation options, such as waiting for specific network events or DOM elements to load. These options help ensure the page is fully rendered before capturing the content.

curl -X POST \
https://production-sfo.browserless.io/export?token=YOUR_API_TOKEN_HERE \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"gotoOptions": {
"waitUntil": "networkidle0",
"timeout": 60000
},
"waitForSelector": {
"selector": "#main-content",
"timeout": 5000
}
}'

Export with Custom Headers

This example demonstrates how to export a web page with custom HTTP headers. Custom headers allow you to modify the browser's behavior when accessing the page, such as changing the User-Agent or setting language preferences.

curl -X POST \
https://production-sfo.browserless.io/export?token=YOUR_API_TOKEN_HERE \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"headers": {
"User-Agent": "Custom User Agent",
"Accept-Language": "en-US"
}
}'

Export with Resource Download

curl -X POST \
https://production-sfo.browserless.io/export?token=YOUR_API_TOKEN_HERE \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"includeResources": true
}' \
--output "webpage.zip"