Implementing Response Caching
When automating browser tasks with repeated requests or expensive operations, response caching improves performance and reduces API costs. This guide shows how to implement a robust caching system for Puppeteer and Playwright scripts.
Benefits and use cases
Response caching provides:
- Performance improvement: Avoid repeated expensive operations
- Reduced costs: Minimize unnecessary requests to Browserless and external services
- Better reliability: Serve cached content when requests fail
- Faster development: Speed up testing cycles
Use caching for data scraping workflows, API testing, content analysis, and multi-step automation where intermediate results can be reused.
ResponseCache implementation
Here's a file-based caching system with configurable TTL and automatic cleanup:
import fs from 'fs';
import crypto from 'crypto';
class ResponseCache {
constructor(cacheDir = './cache', ttl = 300000) {
this.cacheDir = cacheDir; // Directory to store cache files
this.ttl = ttl; // Time-to-live in milliseconds
this.ensureCacheDir();
}
// Create cache directory if it doesn't exist
ensureCacheDir() {
if (!fs.existsSync(this.cacheDir)) {
fs.mkdirSync(this.cacheDir, { recursive: true });
}
}
// Generate unique cache key from URL and options
getCacheKey(url, options = {}) {
const data = JSON.stringify({ url, ...options });
return crypto.createHash('md5').update(data).digest('hex');
}
// Retrieve cached response if valid and not expired
get(url, options = {}) {
const key = this.getCacheKey(url, options);
const filePath = `${this.cacheDir}/${key}.json`;
if (fs.existsSync(filePath)) {
const data = JSON.parse(fs.readFileSync(filePath, 'utf8'));
// Check if cache entry is still valid
if (Date.now() - data.timestamp < this.ttl) {
console.log(`Cache hit for ${url}`);
return data.response;
}
}
return null;
}
// Store response in cache with timestamp
set(url, response, options = {}) {
const key = this.getCacheKey(url, options);
const filePath = `${this.cacheDir}/${key}.json`;
const data = {
timestamp: Date.now(),
response: response,
url: url,
options: options
};
fs.writeFileSync(filePath, JSON.stringify(data, null, 2));
console.log(`Cached response for ${url}`);
}
// Remove expired cache entries
clearExpired() {
if (!fs.existsSync(this.cacheDir)) return;
const files = fs.readdirSync(this.cacheDir);
let clearedCount = 0;
files.forEach(file => {
if (file.endsWith('.json')) {
const filePath = `${this.cacheDir}/${file}`;
try {
const data = JSON.parse(fs.readFileSync(filePath, 'utf8'));
if (Date.now() - data.timestamp >= this.ttl) {
fs.unlinkSync(filePath);
clearedCount++;
}
} catch (error) {
// Remove corrupted files
fs.unlinkSync(filePath);
clearedCount++;
}
}
});
if (clearedCount > 0) {
console.log(`Cleared ${clearedCount} expired cache entries`);
}
}
}
Usage examples
- Puppeteer
- Playwright
import puppeteer from "puppeteer-core";
const cache = new ResponseCache('./cache', 300000);
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://production-sfo.browserless.io/?token=YOUR_API_TOKEN_HERE`,
});
const page = await browser.newPage();
const url = 'https://example.com';
let content = cache.get(url);
if (!content) {
await page.goto(url, { waitUntil: 'domcontentloaded' });
content = await page.content();
cache.set(url, content);
}
console.log('Page content length:', content.length);
cache.clearExpired();
await browser.close();
import playwright from "playwright";
const cache = new ResponseCache('./cache', 300000);
const browser = await playwright.chromium.connectOverCDP(
`wss://production-sfo.browserless.io/?token=YOUR_API_TOKEN_HERE`
);
const page = await browser.newPage();
const url = 'https://example.com';
let content = cache.get(url);
if (!content) {
await page.goto(url, { waitUntil: 'domcontentloaded' });
content = await page.content();
cache.set(url, content);
}
console.log('Page content length:', content.length);
cache.clearExpired();
await browser.close();
How it works
The cache system operates in four key steps:
- Generate cache keys: URLs and options are hashed to create unique identifiers
- Check for cached data: Before making requests, check if valid cached data exists
- Validate freshness: Compare timestamps against TTL to ensure data hasn't expired
- Store responses: Cache fresh responses with timestamps for future use
Advanced usage
You can include request-specific options in cache keys for different scenarios:
const cacheOptions = { userAgent: 'mobile', viewport: { width: 375, height: 667 } };
let content = cache.get(url, cacheOptions);
if (!content) {
await page.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X)');
await page.setViewportSize({ width: 375, height: 667 });
await page.goto(url);
content = await page.content();
cache.set(url, content, cacheOptions);
}
Configuration notes
TTL examples: Static content (15-60 min), dynamic content (1-5 min), development (30 sec - 2 min)
Storage considerations: Monitor cache size, run clearExpired()
regularly, avoid caching sensitive data
Optimization: Cache at appropriate granularity, log hit/miss ratios, consider in-memory caching for small datasets