Tutorial

How to Scrape Websites with Puppeteer: A 2025 Beginner’s Guide

May 19, 2025 Mike

Scrape websites with Puppeteer efficiently using modern techniques that are perfect for developers, SEO professionals, and data analysts. Puppeteer, a Node.js library developed by Google, has become one of the go-to solutions for browser automation and web scraping in recent years. Whether you are scraping data for competitive analysis, price monitoring, or SEO audits, learning how to scrape with Puppeteer can significantly enhance your workflow. In this guide, we will walk you through what Puppeteer is, how to set it up, practical use cases, and smart strategies to get clean, structured data from complex websites.

What is Puppeteer and Why Use It to Scrape Website Content?

Puppeteer is a Node.js library maintained by the Chrome DevTools team. It allows you to control a headless (or full) instance of Chromium, which makes it ideal for rendering JavaScript-heavy sites that traditional scrapers struggle with. This capability to handle modern web technologies makes it one of the most reliable tools when learning how to scrape websites with Puppeteer.

Unlike basic HTML parsers, Puppeteer can interact with every part of a website just like a user. It can click buttons, fill forms, take screenshots, and wait for elements to load, offering far more flexibility when you scrape with Puppeteer compared to other tools.

Installation and Setup Web Scraping with Puppeteer

Prerequisites

Node.js (comes with npm)
A code editor (e.g., VS Code)

Steps to Install Puppeteer

Install Node.js: Download Node.js from its official website.
Initialize a Project:

npm init -y

This command generates a package.json file to manage project dependencies.

Install Puppeteer:

npm install puppeteer

Puppeteer downloads a compatible Chromium version by default, ensuring seamless integration.

Getting Started with Puppeteer

Puppeteer uses asynchronous calls, all our examples use `async-await` syntax. To ensure smooth operations, and if you want to integrate ProxyTee proxies, please check out the simple API from ProxyTee.

Simple Example of Using Puppeteer

Create a `example1.js` file and add the code below:

const puppeteer = require('puppeteer');

(async () => {
  // Add code here
})();

The `require` keyword makes the Puppeteer library accessible, and a placeholder for asynchronous functions is created.

Next, launch the browser with this command, by default, it starts in headless mode.

const browser = await puppeteer.launch();

If a UI is needed, it can be added as a parameter like below:

const browser = await puppeteer.launch({ headless: false }); // default is true

Now create a page, which represents a tab, using below line:

const page = await browser.newPage();

A website can be loaded with the function `goto()`:

await page.goto('https://proxytee.com/');

Once the page is loaded, take a screenshot using below:

await page.screenshot({ path: 'proxytee_1080.png' });

By default, it takes screenshots with 800×600 dimensions. To change that, use the setViewPort method:

await page.setViewport({ width: 1920, height: 1080 });

Finally, close the browser after your work done.

await browser.close();

Here is the complete script for taking a screenshot:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setViewport({ width: 1920, height: 1080 });
  await page.goto('https://proxytee.com/');
  await page.screenshot({ path: 'proxytee_1080.png' });
  await browser.close();
})();

Run this script with:

node example1.js

This generates a new file named `proxytee_1080.png` in the same folder.

Bonus Tip: To generate a PDF file, use `pdf()`:

await page.pdf({ path: 'proxytee.pdf', format: 'A4' });

Scraping an Element from a Page

Puppeteer loads complete DOM, allowing you to extract any element. The `evaluate()` method allows running JavaScript inside the page’s context and lets you extract any data. Use `document.querySelector()` to target specific elements.

Let’s extract the title of this page from Wikipedia about web scraping. Use `Inspect` in browser developer tool, and find that heading element id is `#firstHeading`. In the `Console` tab in the developer tool write in this line:

document.querySelector('#firstHeading')

You can use below method to get the element content using Javascript:

document.querySelector('#firstHeading').textContent

To do it via `evaluate` method, you need the surrounding line:

await page.evaluate(() => {
    return document.querySelector("#firstHeading").textContent;
});

Here is the full code:

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://en.wikipedia.org/wiki/Web_scraping");

  title = await page.evaluate(() => {
    return document.querySelector("#firstHeading").textContent.trim();
  });

  console.log(title);
  await browser.close();
})();

Scraping Multiple Elements

Extracting multiple elements follows these steps:

Use `querySelectorAll` to select all matching elements:

headings_elements = document.querySelectorAll("h2 .mw-headline");

Convert `NodeList` into an array.

headings_array = Array.from(headings_elements);

Map each element with text content.

return headings_array.map(heading => heading.textContent);

Below is the full script for extracting multiple items from a website:

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://en.wikipedia.org/wiki/Web_scraping");

  headings = await page.evaluate(() => {
    headings_elements = document.querySelectorAll("h2 .mw-headline");
    headings_array = Array.from(headings_elements);
    return headings_array.map(heading => heading.textContent);
  });

  console.log(headings);
  await browser.close();
})();

Bonus Tip: Use map function directly, it depends on your preference

headings = await page.evaluate(() => {
  return Array.from(document.querySelectorAll("h2 .mw-headline"), heading => heading.innerText.trim());
});

Scraping a Hotel Listing Page

This section demonstrates how to scrape a listing page for JSON output. You can apply this to various types of listing. We’ll use an Airbnb page with 20 hotels.

Note: Website structures change often, so you need to recheck selectors every time.

The selector for the container element of hotel listing cards are like:

root = Array.from(document.querySelectorAll('div[data-testid="card-container"]'));

This will return 20 elements and will be used in `map()` function. Within the `map()` we’ll extract text, and image.

hotels = root.map(hotel => ({
 // code here
}));

You can get the hotel name with:

hotel.querySelector('div[data-testid="listing-card-title"]').textContent

The core idea here is concatenating querySelectors. For first hotel, you can locate element with:

document.querySelectorAll('div[data-testid="card-container"]')[0].querySelector('div[data-testid="listing-card-title"]').textContent

Image URL of hotels can be located with:

hotel.querySelector("img").getAttribute("src")

Constructing the `Hotel` object will be with syntax as:

Hotel = {
  Name: 'x',
  Photo: 'y'
}

Below is the complete script. Save it as `bnb.js`.

const puppeteer = require("puppeteer");

(async () => {
  let url ="https://www.airbnb.com/s/homes?refinement_paths%5B%5D=%2Fhomes&search_type=section_navigation&property_type_id%5B%5D=8";
  const browser = await puppeteer.launch(url);
  const page = await browser.newPage();
  await page.goto(url);

  data = await page.evaluate(() => {
    root = Array.from(document.querySelectorAll('div[data-testid="card-container"]'));
    hotels = root.map(hotel => ({
      Name: hotel.querySelector('div[data-testid="listing-card-title"]').textContent,
      Photo: hotel.querySelector("img").getAttribute("src")
    }));
    return hotels;
  });

  console.log(data);
  await browser.close();
})();

Run this using:

node bnb.js

A JSON object will be shown on console.

Visualizing the Scraping Process

Here’s a simplified flow of how to scrape with Puppeteer:

Launch browser
Navigate to target URL
Wait for data elements
Extract using page.evaluate
Close browser and save data

Visual tools such as Flowchart.js or even basic flow diagrams in whiteboard sessions help developers and analysts map their scraping logic clearly.

Why Many Developers Prefer to Scrape Websites with Puppeteer

Among several scraping tools available in 2025, Puppeteer continues to be favored because:

It mimics human browsing and works on JavaScript-heavy pages
It integrates smoothly into CI/CD pipelines and Node.js projects
It can be easily extended with plugins and proxies

For developers and SEO professionals who need more than simple HTML scraping, Puppeteer brings powerful browser capabilities into programmable logic.

Real-World Insights: Case Study Using Puppeteer for Price Comparison

One digital marketing agency used Puppeteer to scrape websites of three leading retailers. They tracked over 500 products daily and fed the data into a dashboard that alerted them to price shifts. By using waitForSelector and screenshot capture, they ensured all content was current and verifiable. The results improved their client’s pricing strategy and competitive reaction time.

How to Use Puppeteer Without Getting Blocked

When you scrape websites with Puppeteer, anti-bot systems may flag repeated actions. To minimize this, consider these strategies:

Rotate user-agents and proxy IPs regularly
Introduce random sleep intervals between requests
Use Puppeteer in headful mode occasionally
Leverage residential proxy networks for more human-like browsing

These techniques, when implemented carefully, keep your scraping routines sustainable and unblockable across most targets.

Top Use Cases When You Scrape Websites with Puppeteer

Understanding real-world scenarios can help you better grasp how to scrape websites with Puppeteer effectively. Here are practical use cases:

Price monitoring for eCommerce: Puppeteer can log in, handle CAPTCHAs, and extract price tags from dynamic content.
SEO metadata collection: Collect page titles, descriptions, and canonical tags from multiple domains using custom scripts.
Job board data extraction: Automate navigation across paginated listings and extract job titles, descriptions, and company info.
Competitor intelligence: Extract product features and marketing copy to monitor how others position their brand.
Automated screenshots for reporting: Take visual snapshots of specific sections for analytics or marketing use.

Tips to Efficiently Scrape with Puppeteer

When learning how to scrape websites with Puppeteer, the following techniques can make your scripts more stable and scalable:

Use waitForSelector: This ensures Puppeteer waits for dynamic content to fully load before extracting data.
Limit concurrency: Avoid getting blocked by running fewer simultaneous scrapers or adding randomized delays.
Handle pagination logically: Use loops and selectors to scrape across multiple pages by detecting “next” buttons.
Use stealth mode: Integrate puppeteer-extra-plugin-stealth to reduce detection on anti-bot systems.
Save outputs smartly: Store your results in CSV or JSON formats for use in other analytics tools.

Next Steps to Scrape Websites with Puppeteer More Effectively

Now that you know how to scrape websites with Puppeteer, the next steps involve refining your scripts for performance and legality. Always check the terms of service of any website you target. Consider logging every run and tracking changes in HTML structure using diff-checkers. And most importantly, update your scripts as websites evolve. Puppeteer is a powerful tool, and when paired with best practices, it becomes an indispensable part of your data workflow.

How to Scrape Websites with Puppeteer: A 2025 Beginner’s Guide

What is Puppeteer and Why Use It to Scrape Website Content?

Installation and Setup Web Scraping with Puppeteer

Prerequisites

Steps to Install Puppeteer

Getting Started with Puppeteer

Simple Example of Using Puppeteer

Scraping an Element from a Page

Scraping Multiple Elements

Scraping a Hotel Listing Page

Visualizing the Scraping Process

Why Many Developers Prefer to Scrape Websites with Puppeteer

Real-World Insights: Case Study Using Puppeteer for Price Comparison

How to Use Puppeteer Without Getting Blocked

Top Use Cases When You Scrape Websites with Puppeteer

Tips to Efficiently Scrape with Puppeteer

Next Steps to Scrape Websites with Puppeteer More Effectively

We help ambitious businesses achieve more

Products

Tools

Legal

Support

Contact sales

How to Scrape Websites with Puppeteer: A 2025 Beginner’s Guide

What is Puppeteer and Why Use It to Scrape Website Content?

Installation and Setup Web Scraping with Puppeteer

Prerequisites

Steps to Install Puppeteer

Getting Started with Puppeteer

Simple Example of Using Puppeteer

Scraping an Element from a Page

Scraping Multiple Elements

Scraping a Hotel Listing Page

Visualizing the Scraping Process

Why Many Developers Prefer to Scrape Websites with Puppeteer

Real-World Insights: Case Study Using Puppeteer for Price Comparison

How to Use Puppeteer Without Getting Blocked

Top Use Cases When You Scrape Websites with Puppeteer

Tips to Efficiently Scrape with Puppeteer

Next Steps to Scrape Websites with Puppeteer More Effectively

Related Posts

Learn Web Scraping with Beautiful Soup

Best Rotating Proxies in 2025

What Is CAPTCHA and How Does It Work?

We help ambitious businesses achieve more

Products

Tools

Legal

Support