Scrape Yelp business listings
Search Yelp for businesses and extract names, ratings, review counts, categories, and price ranges.
- A Browserless API token from your account dashboard
Steps
Yelp renders its search results with JavaScript and has bot-detection measures. The examples below search for pizza in New York and route through stealth mode with a residential proxy.
Yelp updates its markup periodically. If selectors stop returning results, inspect the live page with browser DevTools to find the current element and attribute names.
- AI Agent
- REST API
- Frameworks
- BQL
Use the Browserless MCP server to scrape business listings from Yelp from any MCP-compatible AI agent (Claude Desktop, Cursor, Windsurf, ChatGPT, etc.).
1. Connect the MCP server
Send this prompt to your AI agent to install the Browserless MCP server:
Go to https://github.com/browserless/browserless-mcp/blob/main/install.md
and follow the instructions to install the Browserless MCP server
for my client.
2. Scrape Yelp
Use browserless_smartscraper. It handles Yelp's dynamic content and bot protection automatically.
Use the browserless_smartscraper tool to scrape business listings
from https://www.yelp.com/search?find_desc=pizza&find_loc=New+York%2C+NY
and return the results as markdown
Send the BQL mutation over HTTP to the stealth endpoint. No browser library or BQL IDE required.
- cURL
- JavaScript
- Python
- Java
- C#
1. Send the request
curl -X POST \
"https://production-sfo.browserless.io/stealth/bql?token=YOUR_API_TOKEN_HERE&proxy=residential&proxyCountry=us" \
-H "Content-Type: application/json" \
-d '{
"query": "mutation ScrapeYelp { goto(url: \"https://www.yelp.com/search?find_desc=pizza&find_loc=New+York%2C+NY\", waitUntil: networkIdle) { status } waitForSelector(selector: \"[data-testid=serp-ia-card]\", timeout: 15000) { time } businesses: mapSelector(selector: \"[data-testid=serp-ia-card]\") { name: mapSelector(selector: \"a[class*=businessName] span\") { innerText } rating: mapSelector(selector: \"[aria-label*=star]\") { ratingLabel: attribute(name: \"aria-label\") { value } } reviewCount: mapSelector(selector: \"span[class*=reviewCount]\") { innerText } categories: mapSelector(selector: \"a[class*=categoryLink]\") { innerText } priceRange: mapSelector(selector: \"span[class*=priceRange]\") { innerText } } }",
"variables": {}
}'
2. Check the output
{
"data": {
"goto": { "status": 200 },
"waitForSelector": { "time": 3210 },
"businesses": [
{
"name": [{ "innerText": "Joe's Pizza" }],
"rating": [{ "ratingLabel": { "value": "4.5 star rating" } }],
"reviewCount": [{ "innerText": "1,247" }],
"categories": [{ "innerText": "Pizza" }],
"priceRange": [{ "innerText": "$" }]
},
{
"name": [{ "innerText": "Di Fara Pizza" }],
"rating": [{ "ratingLabel": { "value": "4.0 star rating" } }],
"reviewCount": [{ "innerText": "3,891" }],
"categories": [{ "innerText": "Pizza" }],
"priceRange": [{ "innerText": "$$" }]
}
]
}
}
1. Send the request
const query = `mutation ScrapeYelp {
goto(url: "https://www.yelp.com/search?find_desc=pizza&find_loc=New+York%2C+NY", waitUntil: networkIdle) {
status
}
waitForSelector(selector: "[data-testid=serp-ia-card]", timeout: 15000) {
time
}
businesses: mapSelector(selector: "[data-testid=serp-ia-card]") {
name: mapSelector(selector: "a[class*=businessName] span") { innerText }
rating: mapSelector(selector: "[aria-label*=star]") {
ratingLabel: attribute(name: "aria-label") { value }
}
reviewCount: mapSelector(selector: "span[class*=reviewCount]") { innerText }
categories: mapSelector(selector: "a[class*=categoryLink]") { innerText }
priceRange: mapSelector(selector: "span[class*=priceRange]") { innerText }
}
}`;
const response = await fetch(
'https://production-sfo.browserless.io/stealth/bql?token=YOUR_API_TOKEN_HERE&proxy=residential&proxyCountry=us',
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query, variables: {} }),
}
);
const { data } = await response.json();
console.log(JSON.stringify(data.businesses, null, 2));
2. Check the output
[
{
"name": [{ "innerText": "Joe's Pizza" }],
"rating": [{ "ratingLabel": { "value": "4.5 star rating" } }],
"reviewCount": [{ "innerText": "1,247" }],
"categories": [{ "innerText": "Pizza" }],
"priceRange": [{ "innerText": "$" }]
},
{
"name": [{ "innerText": "Di Fara Pizza" }],
"rating": [{ "ratingLabel": { "value": "4.0 star rating" } }],
"reviewCount": [{ "innerText": "3,891" }],
"categories": [{ "innerText": "Pizza" }],
"priceRange": [{ "innerText": "$$" }]
}
]
1. Install dependencies
pip install requests
2. Send the request
import requests
query = """
mutation ScrapeYelp {
goto(url: "https://www.yelp.com/search?find_desc=pizza&find_loc=New+York%2C+NY", waitUntil: networkIdle) {
status
}
waitForSelector(selector: "[data-testid=serp-ia-card]", timeout: 15000) {
time
}
businesses: mapSelector(selector: "[data-testid=serp-ia-card]") {
name: mapSelector(selector: "a[class*=businessName] span") { innerText }
rating: mapSelector(selector: "[aria-label*=star]") {
ratingLabel: attribute(name: "aria-label") { value }
}
reviewCount: mapSelector(selector: "span[class*=reviewCount]") { innerText }
categories: mapSelector(selector: "a[class*=categoryLink]") { innerText }
priceRange: mapSelector(selector: "span[class*=priceRange]") { innerText }
}
}
"""
response = requests.post(
'https://production-sfo.browserless.io/stealth/bql',
params={
'token': 'YOUR_API_TOKEN_HERE',
'proxy': 'residential',
'proxyCountry': 'us',
},
json={'query': query, 'variables': {}},
)
data = response.json()['data']
for biz in data['businesses']:
name = biz['name'][0]['innerText']
rating = biz['rating'][0]['ratingLabel']['value']
reviews = biz['reviewCount'][0]['innerText']
print(f'{name} | {rating} | {reviews} reviews')
3. Check the output
Joe's Pizza | 4.5 star rating | 1,247 reviews
Di Fara Pizza | 4.0 star rating | 3,891 reviews
1. Send the request
import java.net.URI;
import java.net.http.*;
String token = "YOUR_API_TOKEN_HERE";
String endpoint = "https://production-sfo.browserless.io/stealth/bql?token=" + token
+ "&proxy=residential&proxyCountry=us";
String query = "mutation ScrapeYelp {"
+ " goto(url: \\\"https://www.yelp.com/search?find_desc=pizza&find_loc=New+York%2C+NY\\\", waitUntil: networkIdle) { status }"
+ " waitForSelector(selector: \\\"[data-testid=serp-ia-card]\\\", timeout: 15000) { time }"
+ " businesses: mapSelector(selector: \\\"[data-testid=serp-ia-card]\\\") {"
+ " name: mapSelector(selector: \\\"a[class*=businessName] span\\\") { innerText }"
+ " rating: mapSelector(selector: \\\"[aria-label*=star]\\\") { ratingLabel: attribute(name: \\\"aria-label\\\") { value } }"
+ " reviewCount: mapSelector(selector: \\\"span[class*=reviewCount]\\\") { innerText }"
+ " categories: mapSelector(selector: \\\"a[class*=categoryLink]\\\") { innerText }"
+ " priceRange: mapSelector(selector: \\\"span[class*=priceRange]\\\") { innerText }"
+ " }"
+ " }";
String payload = "{\"query\": \"" + query + "\", \"variables\": {}}";
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(endpoint))
.header("Content-Type", "application/json")
.POST(HttpRequest.BodyPublishers.ofString(payload))
.build();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
System.out.println(response.body());
2. Check the output
{
"data": {
"goto": { "status": 200 },
"waitForSelector": { "time": 3210 },
"businesses": [
{
"name": [{ "innerText": "Joe's Pizza" }],
"rating": [{ "ratingLabel": { "value": "4.5 star rating" } }]
}
]
}
}
1. Send the request
using System.Net.Http;
using System.Text;
using System.Text.Json;
string token = "YOUR_API_TOKEN_HERE";
string endpoint = $"https://production-sfo.browserless.io/stealth/bql?token={token}&proxy=residential&proxyCountry=us";
var payload = new
{
query = @"mutation ScrapeYelp {
goto(url: ""https://www.yelp.com/search?find_desc=pizza&find_loc=New+York%2C+NY"", waitUntil: networkIdle) { status }
waitForSelector(selector: ""[data-testid=serp-ia-card]"", timeout: 15000) { time }
businesses: mapSelector(selector: ""[data-testid=serp-ia-card]"") {
name: mapSelector(selector: ""a[class*=businessName] span"") { innerText }
rating: mapSelector(selector: ""[aria-label*=star]"") {
ratingLabel: attribute(name: ""aria-label"") { value }
}
reviewCount: mapSelector(selector: ""span[class*=reviewCount]"") { innerText }
categories: mapSelector(selector: ""a[class*=categoryLink]"") { innerText }
priceRange: mapSelector(selector: ""span[class*=priceRange]"") { innerText }
}
}",
variables = new { },
};
using (HttpClient httpClient = new HttpClient())
{
var content = new StringContent(
JsonSerializer.Serialize(payload), Encoding.UTF8, "application/json");
var response = await httpClient.PostAsync(endpoint, content);
string body = await response.Content.ReadAsStringAsync();
Console.WriteLine(body);
}
2. Check the output
{
"data": {
"goto": { "status": 200 },
"waitForSelector": { "time": 3210 },
"businesses": [
{
"name": [{ "innerText": "Joe's Pizza" }],
"rating": [{ "ratingLabel": { "value": "4.5 star rating" } }]
}
]
}
}
Connect through stealth mode and a residential proxy so Yelp sees traffic from a real browser, then extract business data from the rendered search results.
- Puppeteer
- Playwright
1. Install dependencies
npm install puppeteer-core
2. Connect and scrape
import puppeteer from 'puppeteer-core';
const browser = await puppeteer.connect({
browserWSEndpoint:
'wss://production-sfo.browserless.io/stealth?token=YOUR_API_TOKEN_HERE&proxy=residential&proxyCountry=us',
});
try {
const page = await browser.newPage();
await page.goto('https://www.yelp.com/search?find_desc=pizza&find_loc=New+York%2C+NY', {
waitUntil: 'networkidle2',
});
await page.waitForSelector('[data-testid="serp-ia-card"]');
const businesses = await page.evaluate(() =>
Array.from(document.querySelectorAll('[data-testid="serp-ia-card"]')).map((card) => ({
name: card.querySelector('a[class*="businessName"] span')?.innerText?.trim() ?? '',
rating: card.querySelector('[aria-label*="star"]')?.getAttribute('aria-label') ?? '',
reviewCount: card.querySelector('span[class*="reviewCount"]')?.innerText?.trim() ?? '',
categories: card.querySelector('a[class*="categoryLink"]')?.innerText?.trim() ?? '',
priceRange: card.querySelector('span[class*="priceRange"]')?.innerText?.trim() ?? '',
}))
);
console.log(JSON.stringify(businesses, null, 2));
} finally {
await browser.close();
}
3. Check the output
Run with node scrape-yelp.mjs. Each object has name, rating, reviewCount, categories, and priceRange fields.
[
{
"name": "Joe's Pizza",
"rating": "4.5 star rating",
"reviewCount": "1,247",
"categories": "Pizza",
"priceRange": "$"
}
]
1. Install dependencies
npm install playwright-core
2. Connect and scrape
import { chromium } from 'playwright-core';
const browser = await chromium.connectOverCDP(
'wss://production-sfo.browserless.io?token=YOUR_API_TOKEN_HERE&stealth&proxy=residential&proxyCountry=us'
);
try {
const context = browser.contexts()[0];
const page = await context.newPage();
await page.goto('https://www.yelp.com/search?find_desc=pizza&find_loc=New+York%2C+NY', {
waitUntil: 'networkidle',
});
await page.waitForSelector('[data-testid="serp-ia-card"]');
const businesses = await page.evaluate(() =>
Array.from(document.querySelectorAll('[data-testid="serp-ia-card"]')).map((card) => ({
name: card.querySelector('a[class*="businessName"] span')?.innerText?.trim() ?? '',
rating: card.querySelector('[aria-label*="star"]')?.getAttribute('aria-label') ?? '',
reviewCount: card.querySelector('span[class*="reviewCount"]')?.innerText?.trim() ?? '',
categories: card.querySelector('a[class*="categoryLink"]')?.innerText?.trim() ?? '',
priceRange: card.querySelector('span[class*="priceRange"]')?.innerText?.trim() ?? '',
}))
);
console.log(JSON.stringify(businesses, null, 2));
} finally {
await browser.close();
}
3. Check the output
Run with node scrape-yelp.mjs. Each object has name, rating, reviewCount, categories, and priceRange fields.
[
{
"name": "Joe's Pizza",
"rating": "4.5 star rating",
"reviewCount": "1,247",
"categories": "Pizza",
"priceRange": "$"
}
]
1. Write the mutation
Navigate to Yelp's search results, wait for business cards to load, then extract structured data from each listing. We use /stealth/bql because Yelp's bot detection blocks standard headless browsers.
mutation ScrapeYelp {
goto(url: "https://www.yelp.com/search?find_desc=pizza&find_loc=New+York%2C+NY", waitUntil: networkIdle) {
status
}
waitForSelector(selector: "[data-testid=serp-ia-card]", timeout: 15000) {
time
}
businesses: mapSelector(selector: "[data-testid=serp-ia-card]") {
name: mapSelector(selector: "a[class*=businessName] span") { innerText }
rating: mapSelector(selector: "[aria-label*=star]") {
ratingLabel: attribute(name: "aria-label") { value }
}
reviewCount: mapSelector(selector: "span[class*=reviewCount]") { innerText }
categories: mapSelector(selector: "a[class*=categoryLink]") { innerText }
priceRange: mapSelector(selector: "span[class*=priceRange]") { innerText }
}
}
2. Run it
Paste into the BQL IDE and click Run.
3. Check the output
{
"data": {
"goto": { "status": 200 },
"waitForSelector": { "time": 3210 },
"businesses": [
{
"name": [{ "innerText": "Joe's Pizza" }],
"rating": [{ "ratingLabel": { "value": "4.5 star rating" } }],
"reviewCount": [{ "innerText": "1,247" }],
"categories": [{ "innerText": "Pizza" }],
"priceRange": [{ "innerText": "$" }]
},
{
"name": [{ "innerText": "Di Fara Pizza" }],
"rating": [{ "ratingLabel": { "value": "4.0 star rating" } }],
"reviewCount": [{ "innerText": "3,891" }],
"categories": [{ "innerText": "Pizza" }],
"priceRange": [{ "innerText": "$$" }]
}
]
}
}
Next steps
- Scrape OpenTable Restaurants -- scrape another restaurant/dining platform with stealth mode
- Automate Google Search -- pull search results using the same
/stealth/bqlendpoint - Solving Cloudflare Challenges -- bypass Cloudflare's interstitial pages before scraping