How to Scrape Booking.com with Python Using ProxyTee

A Booking.com scraper is an automated tool designed to extract data from Booking.com pages. This tool retrieves essential details from property listings, including hotel names, prices, reviews, ratings, amenities, and availability. This data is invaluable for market analysis, price comparison, and creating travel-related datasets. ProxyTee can help in this process by providing the robust infrastructure needed for efficient scraping.
In this post, you’ll learn how to build a Python scraper for Booking.com to efficiently extract hotel data, reviews, and prices, all while leveraging the power of ProxyTee.
Data You Can Scrape From Booking.com
Here’s a list of key data points that can be extracted from Booking.com:
- Property details: Hotel name, address, distance from landmarks (e.g., city center).
- Pricing information: Regular and discounted prices.
- Reviews and ratings: Review score, number of reviews, and guest feedback.
- Availability: Room types available, booking options, and dates with availability.
- Media: Property and room images.
- Amenities: Facilities offered (e.g., Wi-Fi, parking, pool) and room-specific amenities.
- Promotions: Special offers, discounts, and limited-time deals.
- Policies: Cancellation policy and check-in/check-out times.
- Additional details: Property description and nearby attractions.
Scraping Booking.com in Python: Step-by-Step Guide
This step-by-step guide will show you how to build a Booking.com scraper using Python with ProxyTee to ensure efficient and reliable data extraction.
Step 1️⃣: Project Setup
Make sure you have Python 3 installed. If not, download and install it from the official website.
Create a folder for your project:
mkdir booking-scraper
Navigate into the project folder and initialize a virtual environment:
cd booking-scraper
python -m venv env
Activate the virtual environment:
- Linux/macOS:
./env/bin/activate
- Windows:
env/Scripts/activate
Create a scraper.py
file in the project directory, ready for scraping logic.
Step 2️⃣: Select the Scraping Library
Booking.com is a dynamic website, meaning it loads content dynamically using JavaScript. The best approach for scraping dynamic sites is using browser automation tools. For this tutorial, we’ll use Selenium.
Step 3️⃣: Install and Configure Selenium
Install Selenium using pip:
pip install selenium
Import Selenium in scraper.py
and initialize a WebDriver
:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
# create a Chrome web driver instance
driver = webdriver.Chrome(service=Service())
driver.quit()
Remember to include driver.quit()
at the end of the script to close the browser.
Step 4️⃣: Visit the Target Page
Manually perform a search on Booking.com and copy the resulting URL. Then, use Selenium to visit the target page:
driver.get("https://www.booking.com/searchresults.html?ss=New+York&ssne=New+York&ssne_untouched=New+York&label=gen173nr-1FCAEoggI46AdIM1gEaHGIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4Aof767kGwAIB0gIkNGE2MTI1MjgtZjJlNC00YWM4LWFlMmQtOGIxZjM3NWIyNDlm2AIF4AIB&sid=b91524e727f20006ae00489afb379d3a&aid=304142&lang=en-us&sb=1&src_elem=sb&src=index&dest_id=20088325&dest_type=city&checkin=2025-11-18&checkout=2025-12-18&group_adults=2&no_rooms=1&group_children=0")
Step 5️⃣: Deal With the Login Alert
Booking.com often shows a login alert, blocking page content. You need to use Selenium to close it. Identify the close button using developer tools and write a try-except block:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
try:
close_button = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "[role=\"dialog\"] button[aria-label=\"Dismiss sign-in info.\"] "))
)
close_button.click()
except TimeoutException:
print("Sign-in modal did not appear, continuing...")
Step 6️⃣: Select the Booking.com Items
Initialize an empty list to hold scraped data:
items = []
Select all property card elements on the page using CSS selector:
property_items = driver.find_elements(By.CSS_SELECTOR, "[data-testid=\"property-card\"]")
Step 7️⃣: Scrape the Booking.com Items
Use a custom exception handler function to gracefully deal with inconsistent elements on different property items
from selenium.common import NoSuchElementException
def handle_no_such_element_exception(data_extraction_task):
try:
return data_extraction_task()
except NoSuchElementException as e:
return None
Then, inside the loop, extract property data using CSS selectors and our error handler. For example:
for property_item in property_items:
# scraping logic...
url = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "a[data-testid=\"property-card-desktop-single-image\"]").get_attribute("href"))
image = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "img[data-testid=\"image\"]").get_attribute("src"))
title = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"title\"]").text)
address = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"address\"]").text)
distance = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"distance\"]").text)
review_score = None
review_count = None
review_text = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"review-score\"]").text)
if review_text is not None:
# split the review string by \n
parts = review_text.split("\n")
# process each part
for part in parts:
part = part.strip()
# check if this part is a number (potential review score)
if part.replace(".", "", 1).isdigit():
review_score = float(part)
# check if it contains the \"reviews\" string
elif "reviews" in part:
# extract the number before \"reviews\"
review_count = int(part.split(" ")[0].replace(",", ""))
description = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"recommended-units\"]").text)
price_element = handle_no_such_element_exception(lambda: (property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"availability-rate-information\"]")))
if price_element is not None:
original_price = handle_no_such_element_exception(lambda: (
price_element.find_element(By.CSS_SELECTOR, "[aria-hidden=\"true\"]:not([data-testid])").text.replace(",", "")
))
price = handle_no_such_element_exception(lambda: (
price_element.find_element(By.CSS_SELECTOR, "[data-testid=\"price-and-discounted-price\"]").text.replace(",", "")
))
item = {
"url": url,
"image": image,
"title": title,
"address": address,
"distance": distance,
"review_score": review_score,
"review_count": review_count,
"description": description,
"original_price": original_price,
"price": price
}
items.append(item)
Step 8️⃣: Export to CSV
Import the csv
library and write scraped data to CSV:
import csv
output_file = "properties.csv"
with open(output_file, mode="w", newline="", encoding="utf-8") as file:
writer = csv.DictWriter(file, fieldnames=["url", "image", "title", "address", "distance", "review_score", "review_count", "description", "original_price", "price"])
writer.writeheader()
writer.writerows(items)
Step 9️⃣: Put It All Together
Review the complete scraper.py
:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.common import NoSuchElementException
import csv
def handle_no_such_element_exception(data_extraction_task):
try:
return data_extraction_task()
except NoSuchElementException as e:
return None
# create a Chrome web driver instance
driver = webdriver.Chrome(service=Service())
# connect to the target page
driver.get("https://www.booking.com/searchresults.html?ss=New+York&ssne=New+York&ssne_untouched=New+York&label=gen173nr-1FCAEoggI46AdIM1gEaHGIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4Aof767kGwAIB0gIkNGE2MTI1MjgtZjJlNC00YWM4LWFlMmQtOGIxZjM3NWIyNDlm2AIF4AIB&sid=b91524e727f20006ae00489afb379d3a&aid=304142&lang=en-us&sb=1&src_elem=sb&src=index&dest_id=20088325&dest_type=city&checkin=2025-11-18&checkout=2025-12-18&group_adults=2&no_rooms=1&group_children=0")
# handle the sign-in alert
try:
# wait up to 20 seconds for the sign-in alert to appear
close_button = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "[role=\"dialog\"] button[aria-label=\"Dismiss sign-in info.\"] "))
)
# click the close button
close_button.click()
except TimeoutException:
print("Sign-in modal did not appear, continuing...")
# where to store the scraped data
items = []
# select all property items on the page
property_items = driver.find_elements(By.CSS_SELECTOR, "[data-testid=\"property-card\"]")
# iterate over the property items and
# extract data from them
for property_item in property_items:
# scraping logic...
url = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "a[data-testid=\"property-card-desktop-single-image\"]").get_attribute("href"))
image = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "img[data-testid=\"image\"]").get_attribute("src"))
title = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"title\"]").text)
address = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"address\"]").text)
distance = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"distance\"]").text)
review_score = None
review_count = None
review_text = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"review-score\"]").text)
if review_text is not None:
# split the review string by \n
parts = review_text.split("\n")
# process each part
for part in parts:
part = part.strip()
# check if this part is a number (potential review score)
if part.replace(".", "", 1).isdigit():
review_score = float(part)
# check if it contains the \"reviews\" string
elif "reviews" in part:
# extract the number before \"reviews\"
review_count = int(part.split(" ")[0].replace(",", ""))
description = handle_no_such_element_exception(lambda: property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"recommended-units\"]").text)
price_element = handle_no_such_element_exception(lambda: (property_item.find_element(By.CSS_SELECTOR, "[data-testid=\"availability-rate-information\"]")))
if price_element is not None:
original_price = handle_no_such_element_exception(lambda: (
price_element.find_element(By.CSS_SELECTOR, "[aria-hidden=\"true\"]:not([data-testid])").text.replace(",", "")
))
price = handle_no_such_element_exception(lambda: (
price_element.find_element(By.CSS_SELECTOR, "[data-testid=\"price-and-discounted-price\"]").text.replace(",", "")
))
# populate a new item with the scraped data
item = {
"url": url,
"image": image,
"title": title,
"address": address,
"distance": distance,
"review_score": review_score,
"review_count": review_count,
"description": description,
"original_price": original_price,
"price": price
}
# add the new item to the list of scraped items
items.append(item)
# specify the name of the output CSV file
output_file = "properties.csv"
# export the items list to a CSV file
with open(output_file, mode="w", newline="", encoding="utf-8") as file:
#create a CSV writer object
writer = csv.DictWriter(file, fieldnames=["url", "image", "title", "address", "distance", "review_score", "review_count", "description", "original_price", "price"])
# write the header row
writer.writeheader()
# write each item as a row in the CSV
writer.writerows(items)
# close the web driver and release its resources
driver.quit()
Run the script using the command python scraper.py
, and you’ll find the results in properties.csv
.
Taking Your Scraper Further
This tutorial has demonstrated how to build a Booking.com scraper with Python. While the basic script covers fundamental scraping, be aware that issues like anti-scraping measures and dynamic content handling can make scraping challenging.
To ensure a robust and reliable scraping process, consider using ProxyTee. ProxyTee provides unlimited residential proxies that rotates IPs, thus reducing the risk of being blocked. ProxyTee offers:
- Unlimited bandwidth, allowing intensive data operations without worrying about overages.
- Global IP coverage to access specific regions for location-based tasks.
- Auto rotation, which rotates IPs frequently to prevent detection and bans from websites, including booking.com. You can also customize the rotation interval for your needs.
- Simple API integration, supporting automation and compatibility with different workflows
- A user-friendly interface.
Consider using ProxyTee for robust and reliable scraping. ProxyTee ensures your web scraping tasks are efficient, effective, and reliable. Start your data gathering journey with ProxyTee today.