How to Scrape Yellow Pages Data with ProxyTee

How to Scrape Yellow Pages Data with ProxyTee

In today's data-driven world, acquiring business information from platforms like Yellow Pages is essential for market analysis, lead generation, and competitor profiling. This guide will show you how to scrape Yellow Pages data using Python and ProxyTee. ProxyTee offers rotating residential proxies designed for various internet activities, such as web scraping. With features like unlimited bandwidth, a global IP pool, and auto-rotation, it's ideal for data gathering without being blocked. Let’s see how to get started.

Project Setup

Before starting, make sure you have Python installed on your computer. Download the latest version from the official Python website.

1. Install Required Libraries

Install the following Python libraries using pip:

pip install requests bs4

This command installs the requests library for handling HTTP requests and Beautiful Soup for parsing HTML.

2. Import Libraries

Import the necessary libraries in your Python script:

from bs4 import BeautifulSoup
import requests

3. Why ProxyTee for Scraping?

To avoid detection while scraping Yellow Pages, it’s crucial to use a reliable residential proxy service like ProxyTee. ProxyTee offers a large pool of rotating IP addresses, which makes it more difficult for websites to detect and block your scraping attempts. It also helps to bypass geo-restrictions and other anti-scraping measures. Its key features are:

  • Unlimited Bandwidth: Scrape as much data as you need without worrying about data caps. Learn more.
  • Global Coverage: Access a vast pool of over 20 million IP addresses from more than 100 countries. Learn more.
  • Auto-Rotation: IP addresses automatically rotate at customizable intervals between 3 to 60 minutes to enhance your anonymity. Learn more.
  • Multiple Protocol Support: ProxyTee supports both HTTP and SOCKS5 protocols for greater flexibility. Learn more.
  • Easy to use GUI: Get started quickly with the simple and clean GUI design. Learn more.
  • Simple API: ProxyTee has a user-friendly API making the integration with workflows easily. Learn more.
  • Affordable: ProxyTee provides high quality residential proxies with pricing as low as 50% compared to the competition. Learn more.

4. Fetch a Yellow Page with ProxyTee

To begin, set up the target URL you intend to scrape, along with your ProxyTee credentials and send an HTTP request:

url = "https://www.yellowpages.ca/bus/Ontario/North-York/The-Burger-Cellar/6835043.html"

proxy_url = "http://user:[email protected]:1234"
proxies = {
    'http': proxy_url,
    'https': proxy_url
}
response = requests.get(url, proxies=proxies)
print(response.status_code)

Replace user, password and the proxy address with your own credentials. If all goes well, you should get a status_code of 200.

5. Parse the Content of the Yellow Pages

Now you can extract the HTML content from the response object. With the help of the Beautiful Soup library we can turn the raw text data into a parseable format.

soup = BeautifulSoup(response.content, "html.parser")

The soup object contains the parsed HTML content, and we can use CSS selectors to grab all sorts of business data.

6. Extract Business Name

Locate the business name within the HTML source code and retrieve it:

name = soup.find('span', {'itemprop': 'name'}).get_text(strip=True)
print(name)

7. Extract the Business Address

The business address may consist of multiple HTML elements so lets loop through the different parts:

itemprops = ["streetAddress", "addressLocality", "addressRegion", "postalCode"]
address_text = []
for itemprop in itemprops:
    address_text.append(soup.find('span', {'itemprop': itemprop}).get_text(strip=True))
address = ', '.join(text for text in address_text if text)
print(address)

8. Scrape Phone Information

The phone number can be easily extracted in the same way:

phone = soup.find('span', {'itemprop': 'telephone'}).get_text(strip=True)
print(phone)

9. Extract the Ratings

And lastly extract the ratings information:

ratings = soup.find('span', {'class': 'jsReviewsChart'})['aria-label']
print(ratings)

And that's all the code needed to scrape key business data from Yellow Pages!

Conclusion

In this tutorial, you learned how to scrape Yellow Pages data with Python using ProxyTee’s Unlimited Residential Proxies. ProxyTee’s ability to offer rotating residential proxies with features such as auto-rotation and unlimited bandwidth allows you to focus more on the data rather than the complexities of scraping.

For those looking to scale up, consider ProxyTee's Unlimited Residential Proxy solutions which will make your scraping faster and simpler.

FAQ

Can you Scrape Yellow Pages?

Yes, absolutely. It’s legal to scrape Yellow Pages data as it’s public information. However, it is good practice to seek advice from a legal expert to be sure about any particular usage cases you might have.

How Do I Extract Yellow Pages Data to Excel?

After scraping data from Yellow Pages using Python, you can use the pandas library to export the data into Excel.

How Do You Scrape Yellow Pages in Python?

You can write a Python scraper using libraries like requests, Beautiful Soup, or Scrapy, or just simply get a proxy service from ProxyTee with the simple code above.