Skip to main content

OpenAI CUA Integration

OpenAI's Computer Use Agent (CUA) analyzes screenshots and returns structured actions — click, type, scroll — that Playwright executes in a Browserless cloud browser. This enables tasks like form filling, web research, and data extraction without managing browser infrastructure yourself.

How it works

  1. Capture screenshot — take a screenshot of the current browser state
  2. Send to CUA model — call the Responses API with a computer tool
  3. Execute actions — parse the computer_call response and run actions via Playwright
  4. Loop — repeat until the task is complete

Prerequisites

Step-by-Step Setup

In this guide you'll build an example that navigates to Bing, searches for "Browserless.io", and returns a summary of what the company does. We use stealth mode to avoid bot detection.

1. Set your API keys

Grab your Browserless token from your account dashboard and your OpenAI key from OpenAI API Keys.

BROWSERLESS_API_KEY=your-browserless-token
OPENAI_API_KEY=your-openai-key

2. Install dependencies

npm install openai playwright-core typescript ts-node @types/node

3. Connect to Browserless

Use Playwright's CDP connection with stealth mode (recommended for avoiding bot detection):

import { chromium, Page } from "playwright-core";
import OpenAI from "openai";

const client = new OpenAI();
const browser = await chromium.connectOverCDP(
`wss://production-sfo.browserless.io/chromium/stealth?token=${process.env.BROWSERLESS_API_KEY}`,
{ timeout: 60000 }
);
const context = await browser.newContext({ viewport: { width: 1024, height: 768 } });
const page = await context.newPage();

4. Navigate and capture the initial screenshot

Navigate to Bing and capture the initial screenshot to send to the model. Over remote WebSocket connections, standard Playwright screenshots can timeout, so we include a CDP fallback:

await page.goto("https://www.bing.com", { waitUntil: "networkidle" });

async function getScreenshot(page: Page): Promise<string> {
try {
const buffer = await page.screenshot({ timeout: 10000 });
return buffer.toString("base64");
} catch {
// Fallback: use CDP directly
const cdp = await page.context().newCDPSession(page);
const result = await cdp.send("Page.captureScreenshot", { format: "png" });
await cdp.detach();
return result.data;
}
}

const screenshotBase64 = await getScreenshot(page);

5. Send the initial request to the model

Define the task and send it along with the screenshot:

const task = "Search for 'Browserless.io' and tell me what the company does";

let response = await client.responses.create({
model: "computer-use-preview",
tools: [
{
type: "computer_use_preview",
display_width: 1024,
display_height: 768,
environment: "browser",
},
],
input: [
{
role: "user",
content: [
{ type: "input_text", text: task },
{
type: "input_image",
image_url: `data:image/png;base64,${screenshotBase64}`,
},
],
},
],
truncation: "auto",
});

6. Process actions and loop

The model returns a computer_call item with an action to execute. Run the action, capture a new screenshot, and send it back. Repeat until no more computer_call items appear (task complete).

note

The model may return key names like CTRL or CMD that Playwright doesn't recognize. The examples below map these to Playwright's expected format (e.g., Control, Meta).

// Map model key names to Playwright key names
const keyMap: Record<string, string> = {
enter: "Enter", return: "Enter",
ctrl: "Control", cmd: "Meta",
esc: "Escape", backspace: "Backspace",
tab: "Tab", space: "Space",
up: "ArrowUp", down: "ArrowDown",
left: "ArrowLeft", right: "ArrowRight",
};

while (true) {
const computerCalls = response.output.filter(
(item: { type: string }) => item.type === "computer_call"
);

if (computerCalls.length === 0) {
// Task complete — print result
console.log(response.output_text);
break;
}

const computerCall = computerCalls[0];
const action = computerCall.action;

switch (action.type) {
case "click":
await page.mouse.click(action.x, action.y);
break;
case "double_click":
await page.mouse.dblclick(action.x, action.y);
break;
case "type":
await page.keyboard.type(action.text);
break;
case "keypress": {
const mappedKeys = action.keys.map(
(key: string) => keyMap[key.toLowerCase()] || key
);
await page.keyboard.press(mappedKeys.join("+"));
break;
}
case "scroll":
await page.mouse.move(action.x, action.y);
await page.evaluate(
`window.scrollBy(${action.scroll_x}, ${action.scroll_y})`
);
break;
case "screenshot":
// Model wants a fresh screenshot — just continue
break;
}

// Capture new screenshot and send back
const newScreenshot = await getScreenshot(page);

response = await client.responses.create({
model: "computer-use-preview",
previous_response_id: response.id,
tools: [
{
type: "computer_use_preview",
display_width: 1024,
display_height: 768,
environment: "browser",
},
],
input: [
{
type: "computer_call_output",
call_id: computerCall.call_id,
output: {
type: "input_image",
image_url: `data:image/png;base64,${newScreenshot}`,
},
},
],
truncation: "auto",
});
}

Supported actions

ActionPropertiesDescription
clickx, y, buttonClick at coordinates
double_clickx, yDouble-click at coordinates
typetextType text
keypresskeys[]Press keyboard keys
scrollx, y, scroll_x, scroll_yScroll at position
dragstart_x, start_y, end_x, end_yDrag from start to end
waitmsWait for milliseconds
screenshot-Request new screenshot

Advanced configuration

Without stealth mode

If you don't need anti-detection and just want a managed cloud browser:

const browser = await chromium.connectOverCDP(
`wss://production-sfo.browserless.io?token=${process.env.BROWSERLESS_API_KEY}`
);

Residential proxies

Route traffic through real residential IPs for additional anti-detection:

const browser = await chromium.connectOverCDP(
`wss://production-sfo.browserless.io?token=${process.env.BROWSERLESS_API_KEY}&proxy=residential&proxyCountry=us`
);

Regional endpoints

Connect to the closest region for lower latency. See Connection URLs for all available endpoints.

Troubleshooting

Screenshot timeout

If the CDP fallback in Step 4 still times out, try increasing the timeout or check your network connection to Browserless. You can also increase the Playwright connection timeout:

const browser = await chromium.connectOverCDP(url, { timeout: 120000 });

Model returns unrecognized keys

The keyMap / key_map in Step 6 covers the most common mismatches. If you encounter new ones, add them to the map — the Playwright keyboard API docs list all valid key names.

Resources