How to Use Proxy Servers in Node.js with ProxyTee

How to Use Proxy Servers in Node.js with ProxyTee
Photo by Joan Gamell / Unsplash

Web scraping is an essential technique for collecting data from the web, but it often comes with challenges such as IP bans, geo-restrictions, and privacy concerns. Proxy servers help overcome these hurdles by acting as intermediaries between your system and the internet. By routing requests through different IP addresses, proxies enable you to bypass restrictions, access geo-blocked content, and maintain anonymity.

A reliable proxy service is crucial for efficient and uninterrupted web scraping. ProxyTee offers a powerful solution with rotating residential proxies that provide seamless access to content worldwide. With features like Unlimited bandwidth, automatic IP rotation, and robust security, ProxyTee ensures an optimized and secure web scraping experience.


Why Choose ProxyTee for Your Proxy Needs?

ProxyTee provides an advanced proxy management solution with residential, static residential, datacenter, and mobile proxies. These help you access any website from multiple geographic locations, emulate user agents, and ensure anonymity.

ProxyTee also includes IP rotation which enhances the efficiency and the anonymity of your web scraping activities by automatically switching between different proxies, to prevent your IP from being banned.

Here’s why ProxyTee is a superior choice:

  • Unlimited Bandwidth: ProxyTee ensures you don’t have to worry about data overages. This is particularly crucial for high-traffic tasks like web scraping or streaming.
  • Global IP Coverage: With over 20 million IP addresses from more than 100 countries, ProxyTee offers a wide range of geographic locations. Perfect for targeting specific regions or performing location-based tasks.
  • Multiple Protocol Support: ProxyTee supports both HTTP and SOCKS5 protocols, guaranteeing compatibility with a variety of applications and tools.
  • User-Friendly Interface: The clean and intuitive graphical user interface (GUI) makes ProxyTee incredibly user-friendly, allowing you to get started quickly without needing technical expertise.
  • Auto Rotation: The auto-rotation feature changes IP addresses automatically at intervals of 3 to 60 minutes. Customize this feature to suit your scraping needs, protecting you from detection and bans.
  • API Integration: The simple API allows for seamless integration with various applications and workflows, which is perfect for automating proxy tasks.

The main product is Unlimited Residential Proxies, offering features like unlimited bandwidth, rotating IPs, precise geo-targeting, and cost-effectiveness compared to competitors. This makes it ideal for diverse needs, including web scraping, streaming, and data collection.


Getting Started: Setting Up a Proxy in Node.js

Prerequisites

Before starting, you should be familiar with JavaScript and Node.js. If Node.js is not installed on your computer, please install it. You also need a suitable text editor; this guide uses Visual Studio Code (VS Code), known for its user-friendly interface and coding features.

To begin, create a new directory named web-scraping-proxy and initialize your Node.js project. Open your terminal or shell and navigate to your new directory using these commands:

cd web-scraping-proxy
npm init -y

Next, install required Node.js packages for handling HTTP requests and parsing HTML. Run this command in your project directory:

npm install axios node-fetch playwright puppeteer http-proxy-agent
npx playwright install

Axios is used to make HTTP requests, while Playwright and Puppeteer automate browser interactions for scraping dynamic websites. The http-proxy-agent library helps create proxy agents for HTTP requests. npx playwright install installs the necessary drivers for the playwright library.

Set Up a Local Proxy for Web Scraping

Setting up a proxy server is key for web scraping. For this tutorial, you’ll use the open-source tool mitmproxy.

Download version 10.1.6 from the mitmproxy download page, tailored for your operating system. If you need guidance during installation, refer to the mitmproxy installation guide.

Once installed, start mitmproxy with this command in your terminal:

mitmproxy

This command opens an interface window in your terminal. To test if your proxy works correctly, open a new terminal and run:

curl --proxy http://localhost:8080 "http://wttr.in/Paris?0"

This fetches the weather report for Paris. The output should show weather data. In the mitmproxy window, you’ll see the captured request, indicating that your local proxy is running correctly.


Implement a Proxy in Node.js for Web Scraping

Now, let's start the practical part of web scraping with Node.js. Below are example of how to use Fetch, Playwright, and Puppeteer in combination with a proxy.

Scrape a Website Using the Fetch Method

Create a new file called fetchScraping.js in your project root and add this code. This script uses the fetch method to send requests via the local proxy to https://toscrape.com/:

const fetch = require("node-fetch");
const HttpProxyAgent = require("http-proxy-agent");

async function fetchData(url) {
  try {
    const proxyAgent = new HttpProxyAgent.HttpProxyAgent(
      "http://localhost:8080"
    );
    const response = await fetch(url, { agent: proxyAgent });
    const data = await response.text();
    console.log(data); // Outputs the fetched data
  } catch (error) {
    console.error("Error fetching data:", error);
  }
}

fetchData("http://toscrape.com/");

This code defines an async function fetchData which uses the fetch method with proxy config. It retrieves and displays the response data. To execute, run this command in your terminal:

node fetchScraping.js

The output in your terminal will be the HTML content of http://toscrape.com. In your mitmproxy window, you’ll see the request being logged.

Scrape a Website Using Playwright

Playwright allows more dynamic interaction with web pages. Create a file named playwrightScraping.js and use the code below:

const { chromium } = require("playwright");

(async () => {
  const browser = await chromium.launch({
    proxy: {
      server: "http://localhost:8080",
    },
  });
  const page = await browser.newPage();
  await page.goto("http://toscrape.com/");

  // Extract and log the entire HTML content
  const content = await page.content();
  console.log(content);

  await browser.close();
})();

This code uses Playwright with proxy setting to access and extract the content from http://toscrape.com. Run the script with:

node playwrightScraping.js

You’ll see the same HTML content, and the request logged in the mitmproxy window.

Scrape a Website Using Puppeteer

Puppeteer provides a high level of control over headless Chrome. Create puppeteerScraping.js and add the following code:

    const puppeteer = require('puppeteer');

    (async () => {
        const browser = await puppeteer.launch({
            args: ['--proxy-server=http://localhost:8080']
        });
        const page = await browser.newPage();
        await page.goto('http://toscrape.com/');
        const content = await page.content();
        console.log(content); // Outputs the page HTML
        await browser.close();
    })();

This initializes Puppeteer with the proxy server and fetches the content of http://toscrape.com. Run this with the command:

node puppeteerScraping.js

You'll see the HTML output in the terminal, and the log in mitmproxy.


Implement ProxyTee Proxy in a Node.js Project

To integrate ProxyTee, get a free trial and navigate to Residential Proxies product page and generate proxy credentials (host, port, username, and password) in the dashboard.

Then, create a file scrapingWithProxyTee.js with the following code, making sure to replace placeholder text with your ProxyTee credentials:

    const axios = require('axios');

    async function fetchDataWithProxyTee(url) {
        const proxyOptions = {
            proxy: {
                host: 'YOUR_MYPROXY_PROXY_HOST',
                port: YOUR_MYPROXY_PROXY_PORT,
                auth: {
                    username: 'YOUR_MYPROXY_USERNAME',
                    password: 'YOUR_MYPROXY_PASSWORD'
                }
            }
        };
        try {
            const response = await axios.get(url, proxyOptions);
            console.log(response.data); // Outputs the fetched data
        } catch (error) {
            console.error('Error:', error);
        }
    }

    fetchDataWithProxyTee('http://lumtest.com/myip.json');

This script configures axios to use ProxyTee and requests data from http://lumtest.com/myip.json, displaying the proxy server information. Execute the script using this command in your terminal:

node scrapingWithProxyTee.js

You’ll see your IP address information which is being used by ProxyTee's proxy server. Run the script multiple times and you'll notice the IPs vary every time, confirming that ProxyTee rotates IPs to avoid bans or blocks.

The user-friendly interface of ProxyTee makes it easy for anyone to manage proxies effectively.


Conclusion

This article has explained how to use proxy servers with Node.js. Without robust solutions such as ProxyTee, web scraping efforts are easily hindered by IP bans. You have also seen the benefits of using ProxyTee in your scraping process. ProxyTee is built to improve the robustness and versatility of your data collection activities.

Remember the importance of responsible scraping, adhering to website terms, and respecting data privacy laws. With these newly acquired skills and the reliability of ProxyTee, you can succeed in ethical web scraping. Happy scraping!