Skip to main content

/export API

info

Currently, Browserless V2 is available in production via two domains: production-sfo.browserless.io and production-lon.browserless.io

The export API allows you to capture and save a webpage as a complete archive, including all resources (HTML, CSS, JavaScript, images, etc.) in a single downloadable file. This is particularly useful for creating offline copies of web pages or preserving web content for archival purposes.

You can check the full Open API schema here.

Basic Usage

The export API accepts a JSON payload with the target URL and configuration options.

JSON Payload Format

{
"url": "https://example.com/",
"headers": {
"User-Agent": "Custom User Agent"
},
"gotoOptions": {
"waitUntil": "networkidle0",
"timeout": 30000
},
"waitForSelector": {
"selector": "#main-content",
"timeout": 5000
},
"waitForTimeout": 1000,
"bestAttempt": false
}

Parameters

Required Parameters

  • url (string) - The URL of the webpage to export

Optional Parameters

  • headers (object) - Custom HTTP headers to send with the request
  • gotoOptions (object) - Navigation options
    • waitUntil (string) - When to consider navigation succeeded. Options: 'load', 'domcontentloaded', 'networkidle', 'commit'. Default: 'networkidle0'
    • timeout (number) - Maximum navigation time in milliseconds
    • referer (string) - Referer header value
  • waitForEvent (object) - Wait for a specific event before proceeding
  • waitForFunction (object) - Wait for a specific function to return true
  • waitForSelector (object) - Wait for a specific selector to be present
    • selector (string) - CSS selector to wait for
    • timeout (number) - Maximum time to wait in milliseconds
  • waitForTimeout (number) - Time in milliseconds to wait after page load
  • bestAttempt (boolean) - Whether to continue on errors. Default: false

Response

The API returns the content of the page with appropriate content type headers. The response format depends on the content type of the page:

  • For HTML content: Returns the HTML with Content-Type: text/html
  • For PDF content: Returns the PDF with Content-Type: application/pdf
  • For other content types: Returns the content with appropriate content type and sets Content-Disposition: attachment

Error Handling

The API may return the following error responses:

  • 400 Bad Request - Invalid parameters, missing URL, or no content received
  • 404 Not Found - Page not found
  • 408 Request Timeout - Page load timeout
  • 500 Internal Server Error - Server-side error

Examples

Basic Export Request

curl -X POST \
https://production-sfo.browserless.io/export?token=YOUR_API_TOKEN_HERE \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/"
}'

Export with Custom Navigation Options

curl -X POST \
https://production-sfo.browserless.io/export?token=YOUR_API_TOKEN_HERE \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"gotoOptions": {
"waitUntil": "networkidle0",
"timeout": 60000
},
"waitForSelector": {
"selector": "#main-content",
"timeout": 5000
}
}'

Export with Custom Headers

curl -X POST \
https://production-sfo.browserless.io/export?token=YOUR_API_TOKEN_HERE \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"headers": {
"User-Agent": "Custom User Agent",
"Accept-Language": "en-US"
}
}'

Best Practices

  1. Page Load Strategies

    • Use appropriate waitUntil options based on your needs:
      • load - Wait for the load event (good for static pages)
      • domcontentloaded - Wait for the DOMContentLoaded event (faster but may miss dynamic content)
      • networkidle0 - Wait until there are no network connections for at least 500ms (good for single-page applications)
      • networkidle2 - Wait until there are no more than 2 network connections for at least 500ms (good for pages with background activity)
  2. Timeout Management

    • Set reasonable timeout values based on your target page's complexity
    • Consider increasing timeouts for:
      • Pages with heavy JavaScript execution
      • Pages with large media files
      • Pages with complex animations
      • Pages with slow network conditions
  3. Content Waiting

    • Use waitForSelector when you need to ensure specific content is loaded
    • Combine with waitForTimeout for additional stability
    • Consider using multiple selectors for critical content
    • Use bestAttempt: true for more resilient scraping, but be aware it may return incomplete content

Resource Management

  1. Asset Handling

    • Use includeAssets wisely to control export size
    • Consider excluding unnecessary resource types:
      • Images for text-only exports
      • Stylesheets for raw content
      • Scripts for static content
    • Use rejectResourceTypes to filter specific asset types
    • Implement size limits for large resources
  2. Network Optimization

    • Use rejectRequestPattern to exclude unnecessary requests
    • Consider implementing request throttling
    • Cache frequently accessed resources
    • Monitor and optimize network usage

Error Handling and Reliability

  1. Robust Error Handling

    • Implement proper error handling for:
      • Network timeouts
      • Resource loading failures
      • Invalid URLs
      • Rate limiting
    • Use appropriate HTTP status codes
    • Implement retry mechanisms for transient failures
  2. Content Validation

    • Verify content completeness
    • Check for expected elements
    • Validate content structure
    • Implement checksums for critical content

Security Considerations

  1. URL and Content Safety

    • Always use HTTPS URLs when possible
    • Validate URLs before making requests
    • Sanitize user-provided URLs
    • Implement content size limits
    • Be cautious when setting custom headers
  2. Authentication and Authorization

    • Use secure methods for API token storage
    • Implement proper access controls
    • Monitor and log access attempts
    • Rotate API tokens regularly

Performance Optimization

  1. Export Size Management

    • Implement compression where appropriate
    • Use appropriate export formats
    • Consider splitting large exports
    • Implement cleanup mechanisms for temporary files
  2. Concurrent Operations

    • Implement proper rate limiting
    • Use appropriate concurrency levels
    • Monitor system resources
    • Implement queue management for high-volume operations

Monitoring and Maintenance

  1. Logging and Monitoring

    • Implement comprehensive logging
    • Monitor success/failure rates
    • Track export sizes and durations
    • Set up alerts for failures
    • Monitor rate limit usage
  2. Maintenance

    • Regularly review and update selectors
    • Monitor for changes in target sites
    • Update error handling as needed
    • Review and optimize timeout values
    • Maintain documentation of changes

For additional support, please refer to the Browserless documentation or contact support.