Skip to main content
Version: v2

Implementing Response Caching

When automating browser tasks with repeated requests or expensive operations, response caching improves performance and reduces API costs. This guide shows how to implement a robust caching system for Puppeteer and Playwright scripts.

Benefits and use cases

Response caching provides:

  • Performance improvement: Avoid repeated expensive operations
  • Reduced costs: Minimize unnecessary requests to Browserless and external services
  • Better reliability: Serve cached content when requests fail
  • Faster development: Speed up testing cycles

Use caching for data scraping workflows, API testing, content analysis, and multi-step automation where intermediate results can be reused.

ResponseCache implementation

Here's a file-based caching system with configurable TTL and automatic cleanup:

import fs from 'fs';
import crypto from 'crypto';

class ResponseCache {
constructor(cacheDir = './cache', ttl = 300000) {
this.cacheDir = cacheDir; // Directory to store cache files
this.ttl = ttl; // Time-to-live in milliseconds
this.ensureCacheDir();
}

// Create cache directory if it doesn't exist
ensureCacheDir() {
if (!fs.existsSync(this.cacheDir)) {
fs.mkdirSync(this.cacheDir, { recursive: true });
}
}

// Generate unique cache key from URL and options
getCacheKey(url, options = {}) {
const data = JSON.stringify({ url, ...options });
return crypto.createHash('md5').update(data).digest('hex');
}

// Retrieve cached response if valid and not expired
get(url, options = {}) {
const key = this.getCacheKey(url, options);
const filePath = `${this.cacheDir}/${key}.json`;

if (fs.existsSync(filePath)) {
const data = JSON.parse(fs.readFileSync(filePath, 'utf8'));
// Check if cache entry is still valid
if (Date.now() - data.timestamp < this.ttl) {
console.log(`Cache hit for ${url}`);
return data.response;
}
}
return null;
}

// Store response in cache with timestamp
set(url, response, options = {}) {
const key = this.getCacheKey(url, options);
const filePath = `${this.cacheDir}/${key}.json`;

const data = {
timestamp: Date.now(),
response: response,
url: url,
options: options
};

fs.writeFileSync(filePath, JSON.stringify(data, null, 2));
console.log(`Cached response for ${url}`);
}

// Remove expired cache entries
clearExpired() {
if (!fs.existsSync(this.cacheDir)) return;

const files = fs.readdirSync(this.cacheDir);
let clearedCount = 0;

files.forEach(file => {
if (file.endsWith('.json')) {
const filePath = `${this.cacheDir}/${file}`;
try {
const data = JSON.parse(fs.readFileSync(filePath, 'utf8'));
if (Date.now() - data.timestamp >= this.ttl) {
fs.unlinkSync(filePath);
clearedCount++;
}
} catch (error) {
// Remove corrupted files
fs.unlinkSync(filePath);
clearedCount++;
}
}
});

if (clearedCount > 0) {
console.log(`Cleared ${clearedCount} expired cache entries`);
}
}
}

Usage examples

import puppeteer from "puppeteer-core";

const cache = new ResponseCache('./cache', 300000);

const browser = await puppeteer.connect({
browserWSEndpoint: `wss://production-sfo.browserless.io/?token=YOUR_API_TOKEN_HERE`,
});

const page = await browser.newPage();
const url = 'https://example.com';

let content = cache.get(url);
if (!content) {
await page.goto(url, { waitUntil: 'domcontentloaded' });
content = await page.content();
cache.set(url, content);
}

console.log('Page content length:', content.length);
cache.clearExpired();
await browser.close();

How it works

The cache system operates in four key steps:

  1. Generate cache keys: URLs and options are hashed to create unique identifiers
  2. Check for cached data: Before making requests, check if valid cached data exists
  3. Validate freshness: Compare timestamps against TTL to ensure data hasn't expired
  4. Store responses: Cache fresh responses with timestamps for future use

Advanced usage

You can include request-specific options in cache keys for different scenarios:

const cacheOptions = { userAgent: 'mobile', viewport: { width: 375, height: 667 } };

let content = cache.get(url, cacheOptions);
if (!content) {
await page.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X)');
await page.setViewportSize({ width: 375, height: 667 });
await page.goto(url);
content = await page.content();
cache.set(url, content, cacheOptions);
}

Configuration notes

TTL examples: Static content (15-60 min), dynamic content (1-5 min), development (30 sec - 2 min)

Storage considerations: Monitor cache size, run clearExpired() regularly, avoid caching sensitive data

Optimization: Cache at appropriate granularity, log hit/miss ratios, consider in-memory caching for small datasets