How to Scrape Google Trends with Python Using ProxyTee

How to Scrape Google Trends with Python Using ProxyTee
Photo by John Schnobrich / Unsplash

Understanding what people are searching for online can unlock significant opportunities for businesses. Google Trends is a robust resource for analyzing search data, helping businesses uncover market insights, understand consumer behavior, and make informed decisions. This article provides a step-by-step guide on how to scrape data from Google Trends using Python and highlights how ProxyTee can enhance your scraping efforts by solving common challenges like IP bans and CAPTCHAs.


Scraping Google Trends can be highly beneficial in various scenarios:

  • Keyword Research: Identify trending keywords to drive organic traffic to your website. By exploring popular search terms by region and time, you can optimize your content strategy to match user interest.
  • Market Research: Understand customer interests and market demands by analyzing search patterns and monitoring trends over time. This allows you to make data-driven decisions.
  • Societal Research: Gain valuable insights into public interest by analyzing how local and global events, technological innovations, economic shifts, and political developments impact search trends. This can inform your analysis and future predictions.
  • Brand Monitoring: Monitor how your brand is perceived by comparing your brand's visibility with competitors and quickly adapt to shifts in public perception.

Overcoming Scraping Challenges with ProxyTee

Google Trends doesn’t provide an official API for data scraping, and conventional methods like using Python with Selenium or BeautifulSoup face challenges, such as:

  • IP bans after repeated requests.
  • Encountering CAPTCHAs, which disrupt the scraping process.

This is where ProxyTee shines, offering Unlimited Residential Proxies that rotate, allowing you to scrape Google Trends without the risks associated with typical scraping. ProxyTee's Unlimited Residential Proxies feature ensures you never have to worry about bandwidth limitations. With over 20 million IP addresses globally, you'll be able to target specific regions effectively, allowing for a comprehensive understanding of the trends you're interested in.

Key Features of ProxyTee:


Although Google Trends lacks a dedicated API, the following steps use Python and libraries like Selenium and BeautifulSoup to perform scraping, combined with ProxyTee's services:

  1. Set Up Your Environment

Before beginning, ensure you have Python installed and a new project directory is created. Create a virtual environment with the following command:

python -m venv myenv

Activate the virtual environment with:

source myenv/bin/activate

Install required packages:

pip install beautifulsoup4 pandas matplotlib selenium
  1. Query Google Trends Data

Use Selenium to mimic browser actions to load pages with dynamic JavaScript content:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

def get_driver():
    CHROME_PATH = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" # Adjust for your system

    options = Options()
    options.binary_location = CHROME_PATH

    driver = webdriver.Chrome(options=options)

    return driver

def get_raw_trends_data(
    driver: webdriver.Chrome, date_range: str, geo: str, query: str
) -> str:
    url = f"https://trends.google.com/trends/explore?date={date_range}&geo={geo}&q={query}"
    print(f"Getting data from {url}")
    driver.get(url)
    driver.get(url) # workaround to bypass error
    driver.maximize_window()
    time.sleep(5)
    return driver.page_source

The get_raw_trends_data function fetches the page content, dealing with potential load errors.

  1. Parse Data Using Beautiful Soup

Use BeautifulSoup to parse the content for structured data:

from bs4 import BeautifulSoup

def extract_interest_by_sub_region(content: str) -> dict:
    soup = BeautifulSoup(content, "html.parser")
    interest_by_subregion = soup.find("div", class_="geo-widget-wrapper geo-resolution-subregion")
    related_queries = interest_by_subregion.find_all("div", class_="fe-atoms-generic-content-container")
    interest_data = {}
    for query in related_queries:
        items = query.find_all("div", class_="item")
        for item in items:
            region = item.find("div", class_="label-text").text.strip()
            interest = item.find("div", class_="progress-value").text.strip()
            interest_data[region] = interest
    return interest_data

This code extracts interest data by sub-region using its class name from the HTML.

  1. Handle Data Pagination

Google Trends data is often paginated. Use Selenium to handle pagination dynamically:

from selenium.webdriver.common.by import By

all_data = {}
driver.find_element(By.CLASS_NAME, "cookieBarConsentButton").click()

while True:
    try:
        geo_widget = driver.find_element(
            By.CSS_SELECTOR, "div.geo-widget-wrapper.geo-resolution-subregion"
        )
        load_more_button = geo_widget.find_element(
            By.CSS_SELECTOR, "button.md-button[aria-label='Next']"
        )
        icon = load_more_button.find_element(By.CSS_SELECTOR, ".material-icons")
        if "arrow-right-disabled" in icon.get_attribute("class"):
            print("No more data to load")
            break
        load_more_button.click()
        time.sleep(2)
        extracted_data = extract_interest_by_sub_region(driver.page_source)
        all_data.update(extracted_data)
    except Exception as e:
        print("No more data to load", e)
        break

driver.quit()

This loop handles pagination by clicking the 'Next' button until it becomes disabled.

  1. Persist and Visualize Data

To save the scraped data to a CSV and visualize trends using pandas and Matplotlib:

import csv

def save_interest_by_sub_region(interest_data: dict):
    interest_data = [{"Region": region, "Interest": interest} for region, interest in interest_data.items()]
    csv_file = "interest_by_region.csv"
    with open(csv_file, mode='w', newline='') as file:
        writer = csv.DictWriter(file, fieldnames=["Region", "Interest"])
        writer.writeheader()
        writer.writerows(interest_data)
    print(f"Data saved to {csv_file}")
    return csv_file

import pandas as pd
import matplotlib.pyplot as plt

def plot_sub_region_data(csv_file_path, output_file_path):
    df = pd.read_csv(csv_file_path)
    plt.figure(figsize=(30, 12))
    plt.bar(df["Region"], df["Interest"], color="skyblue")
    plt.title('Interest by Region')
    plt.xlabel('Region')
    plt.ylabel('Interest')
    plt.xticks(rotation=45)
    plt.savefig(output_file_path)

csv_file_path = save_interest_by_sub_region(all_data)
output_file_path = "interest_by_region.png"
plot_sub_region_data(csv_file_path, output_file_path)

This code snippet writes to a CSV file and plots the sub-region data.


Leveraging ProxyTee to Overcome Scraping Challenges

To overcome challenges like CAPTCHAs and IP bans during large-scale scraping, ProxyTee's Unlimited Residential Proxies provide:

  • Affordable Solutions: Save up to 50% on proxy costs compared to competitors while getting top-tier services.
  • Geo-Targeting: Target countries or regions precisely, rather than settling for broad continent-level targeting.
  • Unlimited Bandwidth:: Scrape as much data as needed without worrying about data limits.
  • Rotating Residential Proxies: Get rotating residential IPs automatically, to prevent IP blocking and bypass CAPTCHAs.

Conclusion

This guide has shown you how to scrape Google Trends data using Python and how to leverage ProxyTee’s services to overcome common scraping hurdles, including IP bans and CAPTCHAs, with Unlimited Residential Proxies. By integrating ProxyTee, you will improve the reliability of the Google Trends data you use to make critical business decisions.

Visit the ProxyTee pricing page to find the solution that best fits your needs.