Tutorial

How to Scrape Yelp Data in Python for Local Business Insights

May 10, 2025 Mike

Local business data fuels countless applications, from location-based search results to business intelligence dashboards. Yelp, as a popular directory of local businesses, holds a wealth of structured and unstructured information including names, ratings, reviews, categories, and locations. This article will guide developers on how to scrape Yelp data in Python to extract valuable business insights. This guide focuses on real-world web scraping use cases, tackles practical challenges like pagination and dynamic content, and explains how to structure the extracted data for analysis or backend integration.

Inspect Yelp’s Structure Before Writing Code

Every scraper starts with a good understanding of the website’s HTML structure. For Yelp search results, each business listing appears in a structured block. You’ll find details like the business name in anchor tags, rating as a CSS-styled star or accessible image, and address elements embedded in nested divs or spans. Inspect the page using Chrome DevTools or Firefox Inspector and note that Yelp often uses dynamically generated class names. When that happens, identify patterns such as consistent tag nesting or data attributes to help you select elements without relying entirely on fragile class names.

Install Dependencies and Setup the Project

To begin scraping, set up a virtual environment and install the required libraries. This article uses requests to make HTTP calls and BeautifulSoup for parsing HTML. These tools are ideal for straightforward scraping tasks without JavaScript rendering.

pip install requests
pip install beautifulsoup4

Then, initialize your script with the necessary imports and headers.

# For making HTTP requests
import requests

# For web scraping and HTML parsing
from bs4 import BeautifulSoup

# For handling time-related operations
import time

# For working with CSV files
import csv

Define a user-agent to simulate a browser request and avoid being blocked immediately.

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/122.0.0.0 Safari/537.36"
}

Extract Basic Business Data from a Yelp Search Page

This function requests a Yelp search result page and parses the business entries. We aim to extract the business name, rating, review count, category, and location for each result.

def fetch_yelp_page(location, term, page=0):
    offset = page * 10
    url = f"https://www.yelp.com/search?find_desc={term}&find_loc={location}&start={offset}"
    
    response = requests.get(url, headers=headers)
    if response.status_code != 200:
        print(f"Request failed: {response.status_code}")
        return []
    
    soup = BeautifulSoup(response.text, "html.parser")
    business_cards = soup.find_all("div", {"class": lambda x: x and "container__" in x})
    
    results = []
    for card in business_cards:
        try:
            name_tag = card.find("a", href=True)
            name = name_tag.text.strip() if name_tag else "N/A"
            
            rating_tag = card.find("div", {"role": "img"})
            rating = rating_tag["aria-label"] if rating_tag else "N/A"
            
            review_tag = card.find("span", string=lambda s: s and "reviews" in s)
            review_count = review_tag.text.strip() if review_tag else "N/A"
            
            category_tag = card.find("span", {"class": lambda x: x and "text-color--black" in x})
            category = category_tag.text.strip() if category_tag else "N/A"
            
            address_tag = card.find("address")
            address = address_tag.text.strip() if address_tag else "N/A"
            
            results.append({
                "name": name,
                "rating": rating,
                "reviews": review_count,
                "category": category,
                "address": address
            })
        except Exception as e:
            print(f"Error parsing card: {e}")
    
    return results

This method retrieves multiple business entries and handles slight variation in HTML with logical fallbacks. The use of lambda functions for class matching adds flexibility to counteract random class names.

Paginate and Collect Multiple Pages with Rate Control

Since Yelp only displays 10 results per page, you will need to paginate through search results. To avoid hitting request limits or IP bans, introduce a small delay between requests.

def scrape_yelp(term, location, max_pages=3):
    all_data = []
    
    for page in range(max_pages):
        print(f"Fetching page {page + 1}")
        page_data = fetch_yelp_page(location, term, page)
        
        if not page_data:
            break
            
        all_data.extend(page_data)
        time.sleep(2)
    
    return all_data

This function loops through several pages and appends the results to a cumulative list. You can later adjust the delay or number of pages based on your scraping strategy and ethical considerations.

Use Proxy Services for Long-Running Jobs

If you plan to run larger crawls or distribute across multiple cities, consider using a proxy rotation provider. The following code integrates a proxy into the requests call.

proxies = {
    "http": "http://username:password@proxy_host:port",
    "https": "http://username:password@proxy_host:port"
}

response = requests.get(
    url,
    headers=headers,
    proxies=proxies
)

Some advanced tools like rotating proxy pools or scraping APIs with browser emulation may also help when traditional requests fail due to JavaScript content or CAPTCHA walls. For smaller runs, IP rotation via your ISP or cloud service provider can suffice.

Export Collected Data to CSV for Business Analysis

Once the data is collected, exporting it to a structured format like CSV allows for easy visualization, statistical analysis, or integration with BI dashboards.

def save_to_csv(data, filename="yelp_scraped_data.csv"):
    if not data:
        return

    keys = data[0].keys()
    
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, keys)
        writer.writeheader()
        writer.writerows(data)

Call this after scraping:

results = scrape_yelp(
    "cafes",
    "New York, NY",
    5
)

save_to_csv(results)

Apply Scraped Yelp Data in Developer Workflows

The data you collect can power several technical applications. Backend developers might feed this data into a relational database or document store to support filtering, sorting, and analytics. Frontend developers can build dashboards using frameworks like React or Svelte to visualize business distributions. Data scientists may use the ratings and review counts to train recommendation engines or sentiment analyzers. Here are a few potential use cases:

Segment businesses by rating and category for marketing intelligence
Geocode addresses and visualize them on a map using Leaflet or Google Maps API
Analyze review count distribution per neighborhood for economic activity proxies

How to Scrape Yelp Data Responsibly and Effectively

Scraping is a powerful tool for collecting web data, but it comes with technical and ethical responsibilities. Always monitor robots.txt, limit request rates, and avoid scraping when an official API is available for your use case. For Yelp, a public Fusion API is available with usage restrictions, which may be more suitable for production applications. When scraping is the only viable option, modularize your scraper to gracefully handle failures, log all errors for debugging, and structure outputs in standardized formats like JSON or CSV. Future enhancements may include JavaScript-rendered scraping using Playwright or Puppeteer, job queuing for distributed crawlers, and full-text review extraction with sentiment scoring.

This guide covered everything from understanding the DOM to exporting structured data. With this foundation, you can confidently scrape Yelp data in Python and turn it into meaningful local business insights tailored to your application’s needs.

How to Scrape Yelp Data in Python for Local Business Insights

Inspect Yelp’s Structure Before Writing Code

Install Dependencies and Setup the Project

Extract Basic Business Data from a Yelp Search Page

Paginate and Collect Multiple Pages with Rate Control

Use Proxy Services for Long-Running Jobs

Export Collected Data to CSV for Business Analysis

Apply Scraped Yelp Data in Developer Workflows

How to Scrape Yelp Data Responsibly and Effectively

We help ambitious businesses achieve more

Products

Tools

Legal

Support

Contact sales

How to Scrape Yelp Data in Python for Local Business Insights

Inspect Yelp’s Structure Before Writing Code

Install Dependencies and Setup the Project

Extract Basic Business Data from a Yelp Search Page

Paginate and Collect Multiple Pages with Rate Control

Use Proxy Services for Long-Running Jobs

Export Collected Data to CSV for Business Analysis

Apply Scraped Yelp Data in Developer Workflows

How to Scrape Yelp Data Responsibly and Effectively

Related Posts

Beginner’s Guide to Web Crawling with Python and Scrapy

Learn Web Scraping with Beautiful Soup

Best Rotating Proxies in 2025

We help ambitious businesses achieve more

Products

Tools

Legal

Support