How to Scrape Websites with Puppeteer: A 2025 Beginner’s Guide

Scrape websites with Puppeteer efficiently using modern techniques that are perfect for developers, SEO professionals, and data analysts. Puppeteer, a Node.js library developed by Google, has become one of the go-to solutions for browser automation and web scraping in recent years. Whether you are scraping data for competitive analysis, price monitoring, or SEO audits, learning how to scrape with Puppeteer can significantly enhance your workflow. In this guide, we will walk you through what Puppeteer is, how to set it up, practical use cases, and smart strategies to get clean, structured data from complex websites.
What is Puppeteer and Why Use It to Scrape Website Content?
Puppeteer is a Node.js library maintained by the Chrome DevTools team. It allows you to control a headless (or full) instance of Chromium, which makes it ideal for rendering JavaScript-heavy sites that traditional scrapers struggle with. This capability to handle modern web technologies makes it one of the most reliable tools when learning how to scrape websites with Puppeteer.
Unlike basic HTML parsers, Puppeteer can interact with every part of a website just like a user. It can click buttons, fill forms, take screenshots, and wait for elements to load, offering far more flexibility when you scrape with Puppeteer compared to other tools.
Installation and Setup Web Scraping with Puppeteer
Prerequisites
- Node.js (comes with npm)
- A code editor (e.g., VS Code)
Steps to Install Puppeteer
- Install Node.js: Download Node.js from its official website.
- Initialize a Project:
npm init -y
This command generates a package.json
file to manage project dependencies.
- Install Puppeteer:
npm install puppeteer
Puppeteer downloads a compatible Chromium version by default, ensuring seamless integration.
Getting Started with Puppeteer
Puppeteer uses asynchronous calls, all our examples use `async-await` syntax. To ensure smooth operations, and if you want to integrate ProxyTee proxies, please check out the simple API from ProxyTee.
Simple Example of Using Puppeteer
Create a `example1.js` file and add the code below:
const puppeteer = require('puppeteer');
(async () => {
// Add code here
})();
The `require` keyword makes the Puppeteer library accessible, and a placeholder for asynchronous functions is created.
Next, launch the browser with this command, by default, it starts in headless mode.
const browser = await puppeteer.launch();
If a UI is needed, it can be added as a parameter like below:
const browser = await puppeteer.launch({ headless: false }); // default is true
Now create a page, which represents a tab, using below line:
const page = await browser.newPage();
A website can be loaded with the function `goto()`:
await page.goto('https://proxytee.com/');
Once the page is loaded, take a screenshot using below:
await page.screenshot({ path: 'proxytee_1080.png' });
By default, it takes screenshots with 800×600 dimensions. To change that, use the setViewPort method:
await page.setViewport({ width: 1920, height: 1080 });
Finally, close the browser after your work done.
await browser.close();
Here is the complete script for taking a screenshot:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
await page.goto('https://proxytee.com/');
await page.screenshot({ path: 'proxytee_1080.png' });
await browser.close();
})();
Run this script with:
node example1.js
This generates a new file named `proxytee_1080.png` in the same folder.
Bonus Tip: To generate a PDF file, use `pdf()`:
await page.pdf({ path: 'proxytee.pdf', format: 'A4' });
Scraping an Element from a Page
Puppeteer loads complete DOM, allowing you to extract any element. The `evaluate()` method allows running JavaScript inside the page’s context and lets you extract any data. Use `document.querySelector()` to target specific elements.
Let’s extract the title of this page from Wikipedia about web scraping. Use `Inspect` in browser developer tool, and find that heading element id is `#firstHeading`. In the `Console` tab in the developer tool write in this line:
document.querySelector('#firstHeading')
You can use below method to get the element content using Javascript:
document.querySelector('#firstHeading').textContent
To do it via `evaluate` method, you need the surrounding line:
await page.evaluate(() => {
return document.querySelector("#firstHeading").textContent;
});
Here is the full code:
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://en.wikipedia.org/wiki/Web_scraping");
title = await page.evaluate(() => {
return document.querySelector("#firstHeading").textContent.trim();
});
console.log(title);
await browser.close();
})();
Scraping Multiple Elements
Extracting multiple elements follows these steps:
- Use `querySelectorAll` to select all matching elements:
headings_elements = document.querySelectorAll("h2 .mw-headline");
- Convert `NodeList` into an array.
headings_array = Array.from(headings_elements);
- Map each element with text content.
return headings_array.map(heading => heading.textContent);
Below is the full script for extracting multiple items from a website:
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://en.wikipedia.org/wiki/Web_scraping");
headings = await page.evaluate(() => {
headings_elements = document.querySelectorAll("h2 .mw-headline");
headings_array = Array.from(headings_elements);
return headings_array.map(heading => heading.textContent);
});
console.log(headings);
await browser.close();
})();
Bonus Tip: Use map function directly, it depends on your preference
headings = await page.evaluate(() => {
return Array.from(document.querySelectorAll("h2 .mw-headline"), heading => heading.innerText.trim());
});
Scraping a Hotel Listing Page
This section demonstrates how to scrape a listing page for JSON output. You can apply this to various types of listing. We’ll use an Airbnb page with 20 hotels.
Note: Website structures change often, so you need to recheck selectors every time.
The selector for the container element of hotel listing cards are like:
root = Array.from(document.querySelectorAll('div[data-testid="card-container"]'));
This will return 20 elements and will be used in `map()` function. Within the `map()` we’ll extract text, and image.
hotels = root.map(hotel => ({
// code here
}));
You can get the hotel name with:
hotel.querySelector('div[data-testid="listing-card-title"]').textContent
The core idea here is concatenating querySelectors. For first hotel, you can locate element with:
document.querySelectorAll('div[data-testid="card-container"]')[0].querySelector('div[data-testid="listing-card-title"]').textContent
Image URL of hotels can be located with:
hotel.querySelector("img").getAttribute("src")
Constructing the `Hotel` object will be with syntax as:
Hotel = {
Name: 'x',
Photo: 'y'
}
Below is the complete script. Save it as `bnb.js`.
const puppeteer = require("puppeteer");
(async () => {
let url ="https://www.airbnb.com/s/homes?refinement_paths%5B%5D=%2Fhomes&search_type=section_navigation&property_type_id%5B%5D=8";
const browser = await puppeteer.launch(url);
const page = await browser.newPage();
await page.goto(url);
data = await page.evaluate(() => {
root = Array.from(document.querySelectorAll('div[data-testid="card-container"]'));
hotels = root.map(hotel => ({
Name: hotel.querySelector('div[data-testid="listing-card-title"]').textContent,
Photo: hotel.querySelector("img").getAttribute("src")
}));
return hotels;
});
console.log(data);
await browser.close();
})();
Run this using:
node bnb.js
A JSON object will be shown on console.
Visualizing the Scraping Process
Here’s a simplified flow of how to scrape with Puppeteer:
- Launch browser
- Navigate to target URL
- Wait for data elements
- Extract using page.evaluate
- Close browser and save data
Visual tools such as Flowchart.js or even basic flow diagrams in whiteboard sessions help developers and analysts map their scraping logic clearly.
Why Many Developers Prefer to Scrape Websites with Puppeteer
Among several scraping tools available in 2025, Puppeteer continues to be favored because:
- It mimics human browsing and works on JavaScript-heavy pages
- It integrates smoothly into CI/CD pipelines and Node.js projects
- It can be easily extended with plugins and proxies
For developers and SEO professionals who need more than simple HTML scraping, Puppeteer brings powerful browser capabilities into programmable logic.
Real-World Insights: Case Study Using Puppeteer for Price Comparison
One digital marketing agency used Puppeteer to scrape websites of three leading retailers. They tracked over 500 products daily and fed the data into a dashboard that alerted them to price shifts. By using waitForSelector and screenshot capture, they ensured all content was current and verifiable. The results improved their client’s pricing strategy and competitive reaction time.
How to Use Puppeteer Without Getting Blocked
When you scrape websites with Puppeteer, anti-bot systems may flag repeated actions. To minimize this, consider these strategies:
- Rotate user-agents and proxy IPs regularly
- Introduce random sleep intervals between requests
- Use Puppeteer in headful mode occasionally
- Leverage residential proxy networks for more human-like browsing
These techniques, when implemented carefully, keep your scraping routines sustainable and unblockable across most targets.
Top Use Cases When You Scrape Websites with Puppeteer
Understanding real-world scenarios can help you better grasp how to scrape websites with Puppeteer effectively. Here are practical use cases:
- Price monitoring for eCommerce: Puppeteer can log in, handle CAPTCHAs, and extract price tags from dynamic content.
- SEO metadata collection: Collect page titles, descriptions, and canonical tags from multiple domains using custom scripts.
- Job board data extraction: Automate navigation across paginated listings and extract job titles, descriptions, and company info.
- Competitor intelligence: Extract product features and marketing copy to monitor how others position their brand.
- Automated screenshots for reporting: Take visual snapshots of specific sections for analytics or marketing use.
Tips to Efficiently Scrape with Puppeteer
When learning how to scrape websites with Puppeteer, the following techniques can make your scripts more stable and scalable:
- Use waitForSelector: This ensures Puppeteer waits for dynamic content to fully load before extracting data.
- Limit concurrency: Avoid getting blocked by running fewer simultaneous scrapers or adding randomized delays.
- Handle pagination logically: Use loops and selectors to scrape across multiple pages by detecting “next” buttons.
- Use stealth mode: Integrate puppeteer-extra-plugin-stealth to reduce detection on anti-bot systems.
- Save outputs smartly: Store your results in CSV or JSON formats for use in other analytics tools.
Next Steps to Scrape Websites with Puppeteer More Effectively
Now that you know how to scrape websites with Puppeteer, the next steps involve refining your scripts for performance and legality. Always check the terms of service of any website you target. Consider logging every run and tracking changes in HTML structure using diff-checkers. And most importantly, update your scripts as websites evolve. Puppeteer is a powerful tool, and when paired with best practices, it becomes an indispensable part of your data workflow.