Skip to main content

Implementing Response Caching

When automating browser tasks with repeated requests or expensive operations, response caching improves performance and reduces API costs. This guide shows how to implement a robust caching system for Puppeteer and Playwright scripts.

Benefits and use cases

Response caching provides:

  • Performance improvement: Avoid repeated expensive operations
  • Reduced costs: Minimize unnecessary requests to Browserless and external services
  • Better reliability: Serve cached content when requests fail
  • Faster development: Speed up testing cycles

Use caching for data scraping workflows, API testing, content analysis, and multi-step automation where intermediate results can be reused.

ResponseCache implementation

Here's a file-based caching system with configurable TTL and automatic cleanup:

import fs from 'fs';
import crypto from 'crypto';

class ResponseCache {
constructor(cacheDir = './cache', ttl = 300000) {
this.cacheDir = cacheDir; // Directory to store cache files
this.ttl = ttl; // Time-to-live in milliseconds
this.ensureCacheDir();
}

// Create cache directory if it doesn't exist
ensureCacheDir() {
if (!fs.existsSync(this.cacheDir)) {
fs.mkdirSync(this.cacheDir, { recursive: true });
}
}

// Generate unique cache key from URL and options
getCacheKey(url, options = {}) {
const data = JSON.stringify({ url, ...options });
return crypto.createHash('md5').update(data).digest('hex');
}

// Retrieve cached response if valid and not expired
get(url, options = {}) {
const key = this.getCacheKey(url, options);
const filePath = `${this.cacheDir}/${key}.json`;

if (fs.existsSync(filePath)) {
const data = JSON.parse(fs.readFileSync(filePath, 'utf8'));
// Check if cache entry is still valid
if (Date.now() - data.timestamp < this.ttl) {
console.log(`Cache hit for ${url}`);
return data.response;
}
}
return null;
}

// Store response in cache with timestamp
set(url, response, options = {}) {
const key = this.getCacheKey(url, options);
const filePath = `${this.cacheDir}/${key}.json`;

const data = {
timestamp: Date.now(),
response: response,
url: url,
options: options
};

fs.writeFileSync(filePath, JSON.stringify(data, null, 2));
console.log(`Cached response for ${url}`);
}

// Remove expired cache entries
clearExpired() {
if (!fs.existsSync(this.cacheDir)) return;

const files = fs.readdirSync(this.cacheDir);
let clearedCount = 0;

files.forEach(file => {
if (file.endsWith('.json')) {
const filePath = `${this.cacheDir}/${file}`;
try {
const data = JSON.parse(fs.readFileSync(filePath, 'utf8'));
if (Date.now() - data.timestamp >= this.ttl) {
fs.unlinkSync(filePath);
clearedCount++;
}
} catch (error) {
// Remove corrupted files
fs.unlinkSync(filePath);
clearedCount++;
}
}
});

if (clearedCount > 0) {
console.log(`Cleared ${clearedCount} expired cache entries`);
}
}
}

Usage examples

import puppeteer from "puppeteer-core";

const cache = new ResponseCache('./cache', 300000);

const browser = await puppeteer.connect({
browserWSEndpoint: `wss://production-sfo.browserless.io/?token=YOUR_API_TOKEN_HERE`,
});

const page = await browser.newPage();
const url = 'https://example.com';

let content = cache.get(url);
if (!content) {
await page.goto(url, { waitUntil: 'domcontentloaded' });
content = await page.content();
cache.set(url, content);
}

console.log('Page content length:', content.length);
cache.clearExpired();
await browser.close();

How it works

The cache system operates in four key steps:

  1. Generate cache keys: URLs and options are hashed to create unique identifiers
  2. Check for cached data: Before making requests, check if valid cached data exists
  3. Validate freshness: Compare timestamps against TTL to ensure data hasn't expired
  4. Store responses: Cache fresh responses with timestamps for future use

Advanced usage

You can include request-specific options in cache keys for different scenarios:

const cacheOptions = { userAgent: 'mobile', viewport: { width: 375, height: 667 } };

let content = cache.get(url, cacheOptions);
if (!content) {
await page.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X)');
await page.setViewportSize({ width: 375, height: 667 });
await page.goto(url);
content = await page.content();
cache.set(url, content, cacheOptions);
}

Configuration notes

TTL examples: Static content (15-60 min), dynamic content (1-5 min), development (30 sec - 2 min)

Storage considerations: Monitor cache size, run clearExpired() regularly, avoid caching sensitive data

Optimization: Cache at appropriate granularity, log hit/miss ratios, consider in-memory caching for small datasets