Affordable Rotating Residential Proxies with Unlimited Bandwidth
  • Products
  • Features
  • Pricing
  • Solutions
  • Blog

Contact sales

Give us a call or fill in the form below and we will contact you. We endeavor to answer all inquiries within 24 hours on business days. Or drop us a message at support@proxytee.com.

Edit Content



    Sign In
    Tutorial

    How to Scrape Websites with Puppeteer: A 2025 Beginner’s Guide

    May 19, 2025 Mike
    How to Scrape Websites with Puppeteer: A 2025 Beginner’s Guide

    Scrape websites with Puppeteer efficiently using modern techniques that are perfect for developers, SEO professionals, and data analysts. Puppeteer, a Node.js library developed by Google, has become one of the go-to solutions for browser automation and web scraping in recent years. Whether you are scraping data for competitive analysis, price monitoring, or SEO audits, learning how to scrape with Puppeteer can significantly enhance your workflow. In this guide, we will walk you through what Puppeteer is, how to set it up, practical use cases, and smart strategies to get clean, structured data from complex websites.

    What is Puppeteer and Why Use It to Scrape Website Content?

    Puppeteer is a Node.js library maintained by the Chrome DevTools team. It allows you to control a headless (or full) instance of Chromium, which makes it ideal for rendering JavaScript-heavy sites that traditional scrapers struggle with. This capability to handle modern web technologies makes it one of the most reliable tools when learning how to scrape websites with Puppeteer.

    Unlike basic HTML parsers, Puppeteer can interact with every part of a website just like a user. It can click buttons, fill forms, take screenshots, and wait for elements to load, offering far more flexibility when you scrape with Puppeteer compared to other tools.

    Installation and Setup Web Scraping with Puppeteer

    Prerequisites

    • Node.js (comes with npm)
    • A code editor (e.g., VS Code)

    Steps to Install Puppeteer

    • Install Node.js: Download Node.js from its official website.
    • Initialize a Project:
    npm init -y
    

    This command generates a package.json file to manage project dependencies.

    • Install Puppeteer:
    npm install puppeteer
    

    Puppeteer downloads a compatible Chromium version by default, ensuring seamless integration.

    Getting Started with Puppeteer

    Puppeteer uses asynchronous calls, all our examples use `async-await` syntax. To ensure smooth operations, and if you want to integrate ProxyTee proxies, please check out the simple API from ProxyTee.

    Simple Example of Using Puppeteer

    Create a `example1.js` file and add the code below:

    const puppeteer = require('puppeteer');
    
    (async () => {
      // Add code here
    })();
    

    The `require` keyword makes the Puppeteer library accessible, and a placeholder for asynchronous functions is created.

    Next, launch the browser with this command, by default, it starts in headless mode.

    const browser = await puppeteer.launch();
    

    If a UI is needed, it can be added as a parameter like below:

    const browser = await puppeteer.launch({ headless: false }); // default is true
    

    Now create a page, which represents a tab, using below line:

    const page = await browser.newPage();
    

    A website can be loaded with the function `goto()`:

    await page.goto('https://proxytee.com/');
    

    Once the page is loaded, take a screenshot using below:

    await page.screenshot({ path: 'proxytee_1080.png' });
    

    By default, it takes screenshots with 800×600 dimensions. To change that, use the setViewPort method:

    await page.setViewport({ width: 1920, height: 1080 });
    

    Finally, close the browser after your work done.

    await browser.close();
    

    Here is the complete script for taking a screenshot:

    const puppeteer = require('puppeteer');
    
    (async () => {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
      await page.setViewport({ width: 1920, height: 1080 });
      await page.goto('https://proxytee.com/');
      await page.screenshot({ path: 'proxytee_1080.png' });
      await browser.close();
    })();
    

    Run this script with:

    node example1.js
    

    This generates a new file named `proxytee_1080.png` in the same folder.

    Bonus Tip: To generate a PDF file, use `pdf()`:

    await page.pdf({ path: 'proxytee.pdf', format: 'A4' });
    

    Scraping an Element from a Page

    Puppeteer loads complete DOM, allowing you to extract any element. The `evaluate()` method allows running JavaScript inside the page’s context and lets you extract any data. Use `document.querySelector()` to target specific elements.

    Let’s extract the title of this page from Wikipedia about web scraping. Use `Inspect` in browser developer tool, and find that heading element id is `#firstHeading`. In the `Console` tab in the developer tool write in this line:

    document.querySelector('#firstHeading')
    

    You can use below method to get the element content using Javascript:

    document.querySelector('#firstHeading').textContent
    

    To do it via `evaluate` method, you need the surrounding line:

    await page.evaluate(() => {
        return document.querySelector("#firstHeading").textContent;
    });
    

    Here is the full code:

    const puppeteer = require("puppeteer");
    
    (async () => {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
      await page.goto("https://en.wikipedia.org/wiki/Web_scraping");
    
      title = await page.evaluate(() => {
        return document.querySelector("#firstHeading").textContent.trim();
      });
    
      console.log(title);
      await browser.close();
    })();
    

    Scraping Multiple Elements

    Extracting multiple elements follows these steps:

    1. Use `querySelectorAll` to select all matching elements:
    headings_elements = document.querySelectorAll("h2 .mw-headline");
    
    1. Convert `NodeList` into an array.
    headings_array = Array.from(headings_elements);
    
    1. Map each element with text content.
    return headings_array.map(heading => heading.textContent);
    

    Below is the full script for extracting multiple items from a website:

    const puppeteer = require("puppeteer");
    
    (async () => {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
      await page.goto("https://en.wikipedia.org/wiki/Web_scraping");
    
      headings = await page.evaluate(() => {
        headings_elements = document.querySelectorAll("h2 .mw-headline");
        headings_array = Array.from(headings_elements);
        return headings_array.map(heading => heading.textContent);
      });
    
      console.log(headings);
      await browser.close();
    })();
    

    Bonus Tip: Use map function directly, it depends on your preference

    headings = await page.evaluate(() => {
      return Array.from(document.querySelectorAll("h2 .mw-headline"), heading => heading.innerText.trim());
    });
    

    Scraping a Hotel Listing Page

    This section demonstrates how to scrape a listing page for JSON output. You can apply this to various types of listing. We’ll use an Airbnb page with 20 hotels.

    Note: Website structures change often, so you need to recheck selectors every time.

    The selector for the container element of hotel listing cards are like:

    root = Array.from(document.querySelectorAll('div[data-testid="card-container"]'));
    

    This will return 20 elements and will be used in `map()` function. Within the `map()` we’ll extract text, and image.

    hotels = root.map(hotel => ({
     // code here
    }));
    

    You can get the hotel name with:

    hotel.querySelector('div[data-testid="listing-card-title"]').textContent
    

    The core idea here is concatenating querySelectors. For first hotel, you can locate element with:

    document.querySelectorAll('div[data-testid="card-container"]')[0].querySelector('div[data-testid="listing-card-title"]').textContent
    

    Image URL of hotels can be located with:

    hotel.querySelector("img").getAttribute("src")
    

    Constructing the `Hotel` object will be with syntax as:

    Hotel = {
      Name: 'x',
      Photo: 'y'
    }
    

    Below is the complete script. Save it as `bnb.js`.

    const puppeteer = require("puppeteer");
    
    (async () => {
      let url ="https://www.airbnb.com/s/homes?refinement_paths%5B%5D=%2Fhomes&search_type=section_navigation&property_type_id%5B%5D=8";
      const browser = await puppeteer.launch(url);
      const page = await browser.newPage();
      await page.goto(url);
    
      data = await page.evaluate(() => {
        root = Array.from(document.querySelectorAll('div[data-testid="card-container"]'));
        hotels = root.map(hotel => ({
          Name: hotel.querySelector('div[data-testid="listing-card-title"]').textContent,
          Photo: hotel.querySelector("img").getAttribute("src")
        }));
        return hotels;
      });
    
      console.log(data);
      await browser.close();
    })();
    

    Run this using:

    node bnb.js
    

    A JSON object will be shown on console.

    Visualizing the Scraping Process

    Here’s a simplified flow of how to scrape with Puppeteer:

    • Launch browser
    • Navigate to target URL
    • Wait for data elements
    • Extract using page.evaluate
    • Close browser and save data

    Visual tools such as Flowchart.js or even basic flow diagrams in whiteboard sessions help developers and analysts map their scraping logic clearly.

    Why Many Developers Prefer to Scrape Websites with Puppeteer

    Among several scraping tools available in 2025, Puppeteer continues to be favored because:

    • It mimics human browsing and works on JavaScript-heavy pages
    • It integrates smoothly into CI/CD pipelines and Node.js projects
    • It can be easily extended with plugins and proxies

    For developers and SEO professionals who need more than simple HTML scraping, Puppeteer brings powerful browser capabilities into programmable logic.

    Real-World Insights: Case Study Using Puppeteer for Price Comparison

    One digital marketing agency used Puppeteer to scrape websites of three leading retailers. They tracked over 500 products daily and fed the data into a dashboard that alerted them to price shifts. By using waitForSelector and screenshot capture, they ensured all content was current and verifiable. The results improved their client’s pricing strategy and competitive reaction time.

    How to Use Puppeteer Without Getting Blocked

    How to Scrape Websites with Puppeteer: A 2025 Beginner’s Guide

    When you scrape websites with Puppeteer, anti-bot systems may flag repeated actions. To minimize this, consider these strategies:

    • Rotate user-agents and proxy IPs regularly
    • Introduce random sleep intervals between requests
    • Use Puppeteer in headful mode occasionally
    • Leverage residential proxy networks for more human-like browsing

    These techniques, when implemented carefully, keep your scraping routines sustainable and unblockable across most targets.

    Top Use Cases When You Scrape Websites with Puppeteer

    Understanding real-world scenarios can help you better grasp how to scrape websites with Puppeteer effectively. Here are practical use cases:

    • Price monitoring for eCommerce: Puppeteer can log in, handle CAPTCHAs, and extract price tags from dynamic content.
    • SEO metadata collection: Collect page titles, descriptions, and canonical tags from multiple domains using custom scripts.
    • Job board data extraction: Automate navigation across paginated listings and extract job titles, descriptions, and company info.
    • Competitor intelligence: Extract product features and marketing copy to monitor how others position their brand.
    • Automated screenshots for reporting: Take visual snapshots of specific sections for analytics or marketing use.

    Tips to Efficiently Scrape with Puppeteer

    When learning how to scrape websites with Puppeteer, the following techniques can make your scripts more stable and scalable:

    • Use waitForSelector: This ensures Puppeteer waits for dynamic content to fully load before extracting data.
    • Limit concurrency: Avoid getting blocked by running fewer simultaneous scrapers or adding randomized delays.
    • Handle pagination logically: Use loops and selectors to scrape across multiple pages by detecting “next” buttons.
    • Use stealth mode: Integrate puppeteer-extra-plugin-stealth to reduce detection on anti-bot systems.
    • Save outputs smartly: Store your results in CSV or JSON formats for use in other analytics tools.

    Next Steps to Scrape Websites with Puppeteer More Effectively

    Now that you know how to scrape websites with Puppeteer, the next steps involve refining your scripts for performance and legality. Always check the terms of service of any website you target. Consider logging every run and tracking changes in HTML structure using diff-checkers. And most importantly, update your scripts as websites evolve. Puppeteer is a powerful tool, and when paired with best practices, it becomes an indispensable part of your data workflow.

    • Puppeteer
    • Web Scraping

    Post navigation

    Previous
    Next

    Categories

    • Comparison & Differences (24)
    • Cybersecurity (5)
    • Datacenter Proxies (1)
    • Digital Marketing & Data Analytics (1)
    • Exploring (57)
    • Mobile Proxies (2)
    • Partners (1)
    • Residental Proxies (2)
    • Rotating Proxies (3)
    • Tutorial (42)
    • Web Scraping (3)

    Recent posts

    • DuoPlus Cloud Mobile Feature Overview: Empowering Unlimited Opportunities Abroad
      DuoPlus Cloud Mobile Feature Overview: Empowering Unlimited Opportunities Abroad!
    • Best Rotating Proxies in 2025
      Best Rotating Proxies in 2025
    • How to Scrape Websites with Puppeteer: A 2025 Beginner’s Guide
      How to Scrape Websites with Puppeteer: A 2025 Beginner’s Guide
    • What Is CAPTCHA and How Does It Work?
      What Is CAPTCHA and How Does It Work?
    • Top Residential Proxy Providers in 2025
      Top Residential Proxy Providers in 2025

    Related Posts

    Best Rotating Proxies in 2025
    Comparison & Differences, Rotating Proxies

    Best Rotating Proxies in 2025

    May 19, 2025 Mike

    Best Rotating Proxies in 2025 are essential tools for developers, marketers, and SEO professionals seeking efficient and reliable data collection. With the increasing complexity of web scraping and data gathering, choosing the right proxy service can significantly impact your operations. This article explores the leading rotating proxy providers in 2025, highlighting their unique features and […]

    What Is CAPTCHA and How Does It Work?
    Exploring

    What Is CAPTCHA and How Does It Work?

    May 18, 2025 Mike

    What is CAPTCHA is a common question among anyone working with web platforms. As websites grow smarter and more user-centric, protecting them from bots becomes more critical. This is where CAPTCHA plays a vital role. In this blog, we will explore what is CAPTCHA, how CAPTCHA works, the different types of CAPTCHA, and walk through […]

    Top Residential Proxy Providers in 2025
    Comparison & Differences

    Top Residential Proxy Providers in 2025

    May 17, 2025 Mike

    The proxy landscape is more competitive and advanced than ever, with many providers offering millions of rotating IPs, multiple protocols, and integration options. This article explores the top residential proxy providers dominating the space in 2025. We will examine their history, core features, and what makes each provider unique. Additionally, we will walk through real-world use […]

    We help ambitious businesses achieve more

    Free consultation
    Contact sales
    • Sign In
    • Sign Up
    • Contact
    • Facebook
    • Twitter
    • Telegram
    Affordable Rotating Residential Proxies with Unlimited Bandwidth

    Get reliable, affordable rotating proxies with unlimited bandwidth for seamless browsing and enhanced security.

    Products
    • Features
    • Pricing
    • Solutions
    • Testimonials
    • FAQs
    • Partners
    Tools
    • App
    • API
    • Blog
    • Check Proxies
    • Free Proxies
    Legal
    • Privacy Policy
    • Terms of Use
    • Affiliate
    • Reseller
    • White-label
    Support
    • Contact
    • Support Center
    • Knowlegde Base

    Copyright © 2025 ProxyTee