How to Scrape Yelp Data in Python for Local Business Insights

Local business data fuels countless applications, from location-based search results to business intelligence dashboards. Yelp, as a popular directory of local businesses, holds a wealth of structured and unstructured information including names, ratings, reviews, categories, and locations. This article will guide developers on how to scrape Yelp data in Python to extract valuable business insights. This guide focuses on real-world web scraping use cases, tackles practical challenges like pagination and dynamic content, and explains how to structure the extracted data for analysis or backend integration.
Inspect Yelp’s Structure Before Writing Code
Every scraper starts with a good understanding of the website’s HTML structure. For Yelp search results, each business listing appears in a structured block. You’ll find details like the business name in anchor tags, rating as a CSS-styled star or accessible image, and address elements embedded in nested divs or spans. Inspect the page using Chrome DevTools or Firefox Inspector and note that Yelp often uses dynamically generated class names. When that happens, identify patterns such as consistent tag nesting or data attributes to help you select elements without relying entirely on fragile class names.
Install Dependencies and Setup the Project
To begin scraping, set up a virtual environment and install the required libraries. This article uses requests to make HTTP calls and BeautifulSoup for parsing HTML. These tools are ideal for straightforward scraping tasks without JavaScript rendering.
pip install requests pip install beautifulsoup4
Then, initialize your script with the necessary imports and headers.
# For making HTTP requests import requests # For web scraping and HTML parsing from bs4 import BeautifulSoup # For handling time-related operations import time # For working with CSV files import csv
Define a user-agent to simulate a browser request and avoid being blocked immediately.
headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " "AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/122.0.0.0 Safari/537.36" }
Extract Basic Business Data from a Yelp Search Page
This function requests a Yelp search result page and parses the business entries. We aim to extract the business name, rating, review count, category, and location for each result.
def fetch_yelp_page(location, term, page=0): offset = page * 10 url = f"https://www.yelp.com/search?find_desc={term}&find_loc={location}&start={offset}" response = requests.get(url, headers=headers) if response.status_code != 200: print(f"Request failed: {response.status_code}") return [] soup = BeautifulSoup(response.text, "html.parser") business_cards = soup.find_all("div", {"class": lambda x: x and "container__" in x}) results = [] for card in business_cards: try: name_tag = card.find("a", href=True) name = name_tag.text.strip() if name_tag else "N/A" rating_tag = card.find("div", {"role": "img"}) rating = rating_tag["aria-label"] if rating_tag else "N/A" review_tag = card.find("span", string=lambda s: s and "reviews" in s) review_count = review_tag.text.strip() if review_tag else "N/A" category_tag = card.find("span", {"class": lambda x: x and "text-color--black" in x}) category = category_tag.text.strip() if category_tag else "N/A" address_tag = card.find("address") address = address_tag.text.strip() if address_tag else "N/A" results.append({ "name": name, "rating": rating, "reviews": review_count, "category": category, "address": address }) except Exception as e: print(f"Error parsing card: {e}") return results
This method retrieves multiple business entries and handles slight variation in HTML with logical fallbacks. The use of lambda functions for class matching adds flexibility to counteract random class names.
Paginate and Collect Multiple Pages with Rate Control
Since Yelp only displays 10 results per page, you will need to paginate through search results. To avoid hitting request limits or IP bans, introduce a small delay between requests.
def scrape_yelp(term, location, max_pages=3): all_data = [] for page in range(max_pages): print(f"Fetching page {page + 1}") page_data = fetch_yelp_page(location, term, page) if not page_data: break all_data.extend(page_data) time.sleep(2) return all_data
This function loops through several pages and appends the results to a cumulative list. You can later adjust the delay or number of pages based on your scraping strategy and ethical considerations.
Use Proxy Services for Long-Running Jobs
If you plan to run larger crawls or distribute across multiple cities, consider using a proxy rotation provider. The following code integrates a proxy into the requests call.
proxies = { "http": "http://username:password@proxy_host:port", "https": "http://username:password@proxy_host:port" } response = requests.get( url, headers=headers, proxies=proxies )
Some advanced tools like rotating proxy pools or scraping APIs with browser emulation may also help when traditional requests fail due to JavaScript content or CAPTCHA walls. For smaller runs, IP rotation via your ISP or cloud service provider can suffice.
Export Collected Data to CSV for Business Analysis
Once the data is collected, exporting it to a structured format like CSV allows for easy visualization, statistical analysis, or integration with BI dashboards.
def save_to_csv(data, filename="yelp_scraped_data.csv"): if not data: return keys = data[0].keys() with open(filename, "w", newline="", encoding="utf-8") as f: writer = csv.DictWriter(f, keys) writer.writeheader() writer.writerows(data)
Call this after scraping:
results = scrape_yelp( "cafes", "New York, NY", 5 ) save_to_csv(results)
Apply Scraped Yelp Data in Developer Workflows
The data you collect can power several technical applications. Backend developers might feed this data into a relational database or document store to support filtering, sorting, and analytics. Frontend developers can build dashboards using frameworks like React or Svelte to visualize business distributions. Data scientists may use the ratings and review counts to train recommendation engines or sentiment analyzers. Here are a few potential use cases:
- Segment businesses by rating and category for marketing intelligence
- Geocode addresses and visualize them on a map using Leaflet or Google Maps API
- Analyze review count distribution per neighborhood for economic activity proxies
How to Scrape Yelp Data Responsibly and Effectively
Scraping is a powerful tool for collecting web data, but it comes with technical and ethical responsibilities. Always monitor robots.txt, limit request rates, and avoid scraping when an official API is available for your use case. For Yelp, a public Fusion API is available with usage restrictions, which may be more suitable for production applications. When scraping is the only viable option, modularize your scraper to gracefully handle failures, log all errors for debugging, and structure outputs in standardized formats like JSON or CSV. Future enhancements may include JavaScript-rendered scraping using Playwright or Puppeteer, job queuing for distributed crawlers, and full-text review extraction with sentiment scoring.
This guide covered everything from understanding the DOM to exporting structured data. With this foundation, you can confidently scrape Yelp data in Python and turn it into meaningful local business insights tailored to your application’s needs.