Add Browser Capabilities to AI with Browser Use

Browser Use is a Python library that allows AI agents to control a browser. By integrating Browserless with Browser Use, you can provide your AI applications with powerful web browsing capabilities without managing browser infrastructure.

Prerequisites

Python 3.11 or higher
An active Browserless API Token (available in your account dashboard)

Step-by-Step Setup

1. Get your API Key

Go to your Browserless Account Dashboard and copy your API token.

Then set the BROWSERLESS_API_TOKEN environment variable in your .env file:

.env file
Command line

BROWSERLESS_API_TOKEN=your-token-here
ANTHROPIC_API_KEY=your-anthropic-key-here

export BROWSERLESS_API_TOKEN=your-token-here
export ANTHROPIC_API_KEY=your-anthropic-key-here

2. Create a virtual environment

Set up a Python virtual environment to manage your dependencies:

venv
conda

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

conda create -n browser-use-env python=3.10
conda activate browser-use-env

3. Install required packages

Install Browser Use and other required packages:

pip
Poetry

pip install browser-use python-dotenv langchain-anthropic

poetry add browser-use python-dotenv langchain-anthropic

4. Create the browser_session.py file

Create a file named browser_session.py with the following complete code:

from typing import Optional
from browser_use.browser.context import BrowserSession, BrowserContext, BrowserContextConfig
from playwright.async_api import Page, BrowserContext as PlaywrightContext

class ExtendedBrowserSession(BrowserSession):
    """Extended version of BrowserSession that includes current_page"""
    def __init__(
        self,
        context: PlaywrightContext,
        cached_state: Optional[dict] = None,
        current_page: Optional[Page] = None
    ):
        super().__init__(context=context, cached_state=cached_state)
        self.current_page = current_page


class UseBrowserlessContext(BrowserContext):
    async def _initialize_session(self) -> ExtendedBrowserSession:
        """Initialize a browser session using existing Browserless page.

        Returns:
            ExtendedBrowserSession: The initialized browser session with current page.
        """
        playwright_browser = await self.browser.get_playwright_browser()
        context = await self._create_context(playwright_browser)
        self._add_new_page_listener(context)

        self.session = ExtendedBrowserSession(
            context=context,
            cached_state=None,
        )

        # Get existing page or create new one
        self.session.current_page = context.pages[0] if context.pages else await context.new_page()

        # Initialize session state
        self.session.cached_state = await self._update_state()

        return self.session

5. Create the main.py file

Create a new file named main.py with the following complete code:

from dotenv import load_dotenv
import os
import asyncio
from browser_use import Browser, BrowserConfig, Agent
from browser_session import UseBrowserlessContext, ExtendedBrowserSession
from browser_use.browser.context import BrowserContextConfig
from langchain_anthropic import ChatAnthropic

async def setup_browser() -> tuple[Browser, UseBrowserlessContext]:
    """Set up browser and context configurations.

    Returns:
        tuple[Browser, UseBrowserlessContext]: Configured browser and context.
    """
    # Browserless connection URL with token (using CDP)
    browserless_url = f"wss://production-sfo.browserless.io?token={os.environ['BROWSERLESS_API_TOKEN']}&proxy=residential"
    
    browser = Browser(config=BrowserConfig(cdp_url=browserless_url))
    context = UseBrowserlessContext(
        browser,
        BrowserContextConfig(
            wait_for_network_idle_page_load_time=10.0,
            highlight_elements=True,
        )
    )

    return browser, context

async def setup_agent(browser: Browser, context: UseBrowserlessContext) -> Agent:
    """Set up the browser automation agent.

    Args:
        browser: Configured browser instance
        context: Browser context for the agent

    Returns:
        Agent: Configured automation agent
    """
    llm = ChatAnthropic(
        model_name="claude-3-5-sonnet-20240620",
        temperature=0.0,
        timeout=100,
    )

    return Agent(
        task="go to https://example.com, navigate the site and report what you found",
        llm=llm,
        browser=browser,
        browser_context=context,
    )

async def main():
    load_dotenv()

    browser, context = await setup_browser()
    print("Browser and context initialized")
    
    session = await context.get_session()
    print("Session obtained")

    try:
        agent = await setup_agent(browser, context)
        print("Agent configured, running now...")
        await agent.run()
    finally:
        # Close the browser
        print("Closing browser")
        await browser.close()

if __name__ == "__main__":
    asyncio.run(main())

6. Run your application

Run your application with the following command:

python main.py

You should see output indicating that the browser is initialized and the agent is running.

How It Works

1. Connection Setup: Browser Use connects to Browserless using the WebSocket endpoint with your API token 2. Agent Configuration: The AI agent is configured with a task and a language model 3. Automation: The agent uses the browser to navigate and interact with websites 4. LLM Integration: The agent leverages an LLM (like Claude) to interpret web content and make decisions

Additional Configuration

Proxy Support

You can enable a residential proxy for improved website compatibility:

browserless_url = f"wss://production-sfo.browserless.io?token={os.environ['BROWSERLESS_API_TOKEN']}&proxy=residential"

Context Configuration

Customize the browser context with additional settings:

context = UseBrowserlessContext(
    browser,
    BrowserContextConfig(
        wait_for_network_idle_page_load_time=10.0,
        highlight_elements=True,
        # Additional configuration options
        user_agent="Custom User Agent",
        viewport_size={"width": 1920, "height": 1080},
        ignore_https_errors=True,
    )
)

Advanced Features

CDP Events and LiveURL

Browserless provides powerful Chrome DevTools Protocol (CDP) events that can enhance your browser automation. Here are some key features:

LiveURL for User Interaction

# Create a CDP session
cdp = await page.createCDPSession()

# Generate a LiveURL for user interaction
response = await cdp.send('Browserless.liveURL', {
    "timeout": 600000  # 10 minutes timeout
})
live_url = response["liveURL"]
print(f"Share this URL with users: {live_url}")

# Wait for user to complete interaction
future = asyncio.Future()
cdp.on('Browserless.liveComplete', lambda: future.set_result(True))
await future

For more details, see our LiveURL Documentation.

Captcha Detection

# Listen for captcha detection
cdp.on('Browserless.captchaFound', lambda: print('Captcha detected!'))

# Solve captcha automatically
response = await cdp.send('Browserless.solveCaptcha', {
    "appearTimeout": 20000
})
solved, error = response.get("solved"), response.get("error")

Learn more about handling captchas in our Hybrid Automation Guide.

Session Recording

# Start recording the session
await cdp.send("Browserless.startRecording")

# ... perform actions ...

# Stop recording and save
response = await cdp.send("Browserless.stopRecording")
with open("recording.webm", "wb") as f:
    f.write(response.value)

See our Recording Documentation for more details.

Complete Example with CDP Events

Here's a complete example that combines LiveURL, captcha handling, and session recording:

from browser_use import Browser, BrowserConfig, Agent
from browser_session import UseBrowserlessContext, ExtendedBrowserSession
from browser_use.browser.context import BrowserContextConfig
from langchain_anthropic import ChatAnthropic
import asyncio
import os

async def setup_browser() -> tuple[Browser, UseBrowserlessContext]:
    browserless_url = f"wss://production-sfo.browserless.io?token={os.environ['BROWSERLESS_API_TOKEN']}"
    
    browser = Browser(config=BrowserConfig(cdp_url=browserless_url))
    context = UseBrowserlessContext(
        browser,
        BrowserContextConfig(
            wait_for_network_idle_page_load_time=10.0,
            highlight_elements=True,
        )
    )

    return browser, context

async def main():
    browser, context = await setup_browser()
    session = await context.get_session()
    
    try:
        # Create CDP session
        cdp = await session.current_page.createCDPSession()
        
        # Start recording
        await cdp.send("Browserless.startRecording")
        
        # Generate LiveURL
        response = await cdp.send('Browserless.liveURL', {
            "timeout": 600000
        })
        live_url = response["liveURL"]
        print(f"Share this URL with users: {live_url}")
        
        # Handle captcha if detected
        cdp.on('Browserless.captchaFound', lambda: print('Captcha detected!'))
        
        # Wait for user interaction
        future = asyncio.Future()
        cdp.on('Browserless.liveComplete', lambda: future.set_result(True))
        await future
        
        # Stop recording and save
        response = await cdp.send("Browserless.stopRecording")
        with open("recording.webm", "wb") as f:
            f.write(response.value)
            
    finally:
        await browser.close()

if __name__ == "__main__":
    asyncio.run(main())

For more information about CDP events and features, please refer to:

Advanced Usage

For more advanced usage scenarios, please refer to:

Prerequisites​

Step-by-Step Setup​

How It Works​

Additional Configuration​

Proxy Support​

Context Configuration​

Advanced Features​

CDP Events and LiveURL​

Complete Example with CDP Events​

Advanced Usage​