Version: v2

Implementing Response Caching

When automating browser tasks with repeated requests or expensive operations, response caching improves performance and reduces API costs. This guide shows how to implement a robust caching system for Puppeteer and Playwright scripts.

Benefits and use cases

Response caching provides:

Performance improvement: Avoid repeated expensive operations
Reduced costs: Minimize unnecessary requests to Browserless and external services
Better reliability: Serve cached content when requests fail
Faster development: Speed up testing cycles

Use caching for data scraping workflows, API testing, content analysis, and multi-step automation where intermediate results can be reused.

ResponseCache implementation

Here's a file-based caching system with configurable TTL and automatic cleanup:

import fs from 'fs';
import crypto from 'crypto';

class ResponseCache {
  constructor(cacheDir = './cache', ttl = 300000) {
    this.cacheDir = cacheDir; // Directory to store cache files
    this.ttl = ttl; // Time-to-live in milliseconds
    this.ensureCacheDir();
  }

  // Create cache directory if it doesn't exist
  ensureCacheDir() {
    if (!fs.existsSync(this.cacheDir)) {
      fs.mkdirSync(this.cacheDir, { recursive: true });
    }
  }

  // Generate unique cache key from URL and options
  getCacheKey(url, options = {}) {
    const data = JSON.stringify({ url, ...options });
    return crypto.createHash('md5').update(data).digest('hex');
  }

  // Retrieve cached response if valid and not expired
  get(url, options = {}) {
    const key = this.getCacheKey(url, options);
    const filePath = `${this.cacheDir}/${key}.json`;
    
    if (fs.existsSync(filePath)) {
      const data = JSON.parse(fs.readFileSync(filePath, 'utf8'));
      // Check if cache entry is still valid
      if (Date.now() - data.timestamp < this.ttl) {
        console.log(`Cache hit for ${url}`);
        return data.response;
      }
    }
    return null;
  }

  // Store response in cache with timestamp
  set(url, response, options = {}) {
    const key = this.getCacheKey(url, options);
    const filePath = `${this.cacheDir}/${key}.json`;
    
    const data = {
      timestamp: Date.now(),
      response: response,
      url: url,
      options: options
    };
    
    fs.writeFileSync(filePath, JSON.stringify(data, null, 2));
    console.log(`Cached response for ${url}`);
  }

  // Remove expired cache entries
  clearExpired() {
    if (!fs.existsSync(this.cacheDir)) return;
    
    const files = fs.readdirSync(this.cacheDir);
    let clearedCount = 0;
    
    files.forEach(file => {
      if (file.endsWith('.json')) {
        const filePath = `${this.cacheDir}/${file}`;
        try {
          const data = JSON.parse(fs.readFileSync(filePath, 'utf8'));
          if (Date.now() - data.timestamp >= this.ttl) {
            fs.unlinkSync(filePath);
            clearedCount++;
          }
        } catch (error) {
          // Remove corrupted files
          fs.unlinkSync(filePath);
          clearedCount++;
        }
      }
    });
    
    if (clearedCount > 0) {
      console.log(`Cleared ${clearedCount} expired cache entries`);
    }
  }
}

Usage examples

Puppeteer
Playwright

import puppeteer from "puppeteer-core";

const cache = new ResponseCache('./cache', 300000);

const browser = await puppeteer.connect({
  browserWSEndpoint: `wss://production-sfo.browserless.io/?token=YOUR_API_TOKEN_HERE`,
});

const page = await browser.newPage();
const url = 'https://example.com';

let content = cache.get(url);
if (!content) {
  await page.goto(url, { waitUntil: 'domcontentloaded' });
  content = await page.content();
  cache.set(url, content);
}

console.log('Page content length:', content.length);
cache.clearExpired();
await browser.close();

import playwright from "playwright";

const cache = new ResponseCache('./cache', 300000);

const browser = await playwright.chromium.connectOverCDP(
  `wss://production-sfo.browserless.io/?token=YOUR_API_TOKEN_HERE`
);

const page = await browser.newPage();
const url = 'https://example.com';

let content = cache.get(url);
if (!content) {
  await page.goto(url, { waitUntil: 'domcontentloaded' });
  content = await page.content();
  cache.set(url, content);
}

console.log('Page content length:', content.length);
cache.clearExpired();
await browser.close();

How it works

The cache system operates in four key steps:

Generate cache keys: URLs and options are hashed to create unique identifiers
Check for cached data: Before making requests, check if valid cached data exists
Validate freshness: Compare timestamps against TTL to ensure data hasn't expired
Store responses: Cache fresh responses with timestamps for future use

Advanced usage

You can include request-specific options in cache keys for different scenarios:

const cacheOptions = { userAgent: 'mobile', viewport: { width: 375, height: 667 } };

let content = cache.get(url, cacheOptions);
if (!content) {
  await page.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X)');
  await page.setViewportSize({ width: 375, height: 667 });
  await page.goto(url);
  content = await page.content();
  cache.set(url, content, cacheOptions);
}

Configuration notes

TTL examples: Static content (15-60 min), dynamic content (1-5 min), development (30 sec - 2 min)

Storage considerations: Monitor cache size, run clearExpired() regularly, avoid caching sensitive data

Optimization: Cache at appropriate granularity, log hit/miss ratios, consider in-memory caching for small datasets

Benefits and use cases​

ResponseCache implementation​

Usage examples​

How it works​

Advanced usage​

Configuration notes​