Web Scraping with C#: Efficient, Scalable, and ProxyTee-Powered

Web scraping with C# has become essential for gathering data from the internet. This guide will demonstrate how to build a web scraper using C#, focusing on leveraging the power of ProxyTee for enhanced data privacy and efficiency. We’ll cover scraping both static and dynamic content, showing you how to extract valuable data and manage potential challenges.
Why Choose ProxyTee for Web Scraping?
ProxyTee is a leading provider of residential proxies that offer robust solutions for web scraping and other online activities requiring anonymity. Here’s why ProxyTee is an excellent choice:
- Unlimited Bandwidth: ProxyTee’s unlimited bandwidth ensures your data-intensive scraping tasks run smoothly without overage concerns.
- Extensive Global Coverage: With over 20 million IPs across 100+ countries, ProxyTee allows you to target specific geographic regions, which is ideal for location-based tasks.
- Multiple Protocol Support: Supporting both HTTP and SOCKS5, ProxyTee integrates seamlessly with various applications and tools, suitable for diverse scraping scenarios.
- User-Friendly Interface: The intuitive GUI allows for quick and easy setup, even for beginners. See Simple & Clean GUI.
- Auto-Rotation: Auto-rotation changes IPs automatically at intervals of 3 to 60 minutes, preventing IP bans when you conduct multiple scraping requests.
- Simple API Integration: ProxyTee’s API supports all features, making it perfect for developers automating their scraping tasks.
ProxyTee’s Unlimited Residential Proxies are particularly well-suited for web scraping. It features geo-targeting and offers lower pricing compared to competitors, starting at up to 50% less.
Top C# Web Scraping Libraries
Before we dive in, let’s look at the essential NuGet scraping libraries for C#:
- HtmlAgilityPack: A popular library for downloading and parsing HTML content, selecting elements, and extracting data.
- HttpClient: Useful for performing asynchronous HTTP requests and downloading web pages efficiently.
- Selenium WebDriver: An automated testing library also excellent for scraping dynamic web pages by controlling web browsers.
- Puppeteer Sharp: The C# port of Puppeteer, enabling headless browser capabilities for handling dynamic content.
In this guide, we’ll use HtmlAgilityPack for static sites and Selenium for dynamic ones—covering both sides of web scraping with C#.
Prerequisites for Web Scraping With C#
- Visual Studio: The Community edition of Visual Studio 2025 is sufficient.
- .NET 6+: Any LTS version of .NET greater than or equal to 6.
If you haven’t already, download the required tools and proceed with their setup.
Setting Up a Project in Visual Studio
- Open Visual Studio and choose “Create a new project”.
- Select “C#” from the dropdown and choose the “Console App” template, then click “Next”.
- Name your project (e.g.,
StaticWebScraping
) and select a suitable .NET version. Click “Create”.
Your App.cs
file is where you’ll write the C# web scraping logic.
Scraping Static Content Websites in C#
Static content sites deliver content directly within HTML documents without needing JavaScript for rendering. Here’s the process:
- Install HtmlAgilityPack using NuGet Package Manager.
- Load the HTML using
HtmlWeb
. - Select elements of interest with XPath.
- Extract the desired data.
Let’s see how to retrieve data from the “List of SpongeBob SquarePants episodes” Wikipedia page, which is a static content site.
Step 1️⃣: Install HtmlAgilityPack
Right-click on “Dependencies”, select “Manage NuGet Packages”, and search for “HtmlAgilityPack”. Install it.
Then, add using HtmlAgilityPack;
at the top of your App.cs
file.
Step 2️⃣: Load an HTML Web Page
string url = "https://en.wikipedia.org/wiki/List_of_SpongeBob_SquarePants_episodes";
var web = new HtmlWeb();
var document = web.Load(url);
This retrieves and parses the HTML content of the target URL using the HtmlWeb
class, giving you access to the HTML document through an HtmlDocument
instance.
Step 3️⃣: Select HTML Elements
We will use XPath to target the table rows containing episode information:
var nodes = document.DocumentNode.SelectNodes("//*[@id='mw-content-text']/div[1]/table[position()>1 and position()<15]/tbody/tr[position()>1]");
XPath lets you select elements in the DOM. This particular selector gets all table rows of the episode tables on the Wikipedia page.
Step 4️⃣: Extract Data From HTML Elements
First, let’s define an Episode
class in an Episode.cs
file:
namespace StaticWebScraping {
public class Episode {
public string OverallNumber { get; set; }
public string Title { get; set; }
public string Directors { get; set; }
public string WrittenBy { get; set; }
public string Released { get; set; }
}
}
Now we add scraping logic in the App.cs
file:
using HtmlAgilityPack;
using System;
using System.Collections.Generic;
namespace StaticWebScraping {
public class Program {
public static void Main() {
string url = "https://en.wikipedia.org/wiki/List_of_SpongeBob_SquarePants_episodes";
var web = new HtmlWeb();
var document = web.Load(url);
var nodes = document.DocumentNode.SelectNodes("//*[@id='mw-content-text']/div[1]/table[position()>1 and position()<15]/tbody/tr[position()>1]");
List<Episode> episodes = new List<Episode>();
foreach (var node in nodes) {
episodes.Add(new Episode() {
OverallNumber = HtmlEntity.DeEntitize(node.SelectSingleNode("th[1]").InnerText),
Title = HtmlEntity.DeEntitize(node.SelectSingleNode("td[2]").InnerText),
Directors = HtmlEntity.DeEntitize(node.SelectSingleNode("td[3]").InnerText),
WrittenBy = HtmlEntity.DeEntitize(node.SelectSingleNode("td[4]").InnerText),
Released = HtmlEntity.DeEntitize(node.SelectSingleNode("td[5]").InnerText)
});
}
// ... convert data to csv, save data or call API ...
}
}
}
This code iterates over all found HTML nodes and extracts data such as episode numbers, titles, and more by using the SelectSingleNode()
method on individual nodes.
Step 5️⃣: Export the Scraped Data to CSV
Install CSVHelper
to simplify CSV writing. After installing, you can export data as follows:
using CsvHelper;
using System.IO;
using System.Globalization;
// scraping logic here ...
using (var writer = new StreamWriter("output.csv"))
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(episodes);
}
This generates a output.csv
file containing all the extracted episode data.
Scraping Dynamic Content Websites in C#
Dynamic content is retrieved using JavaScript and rendered in real-time. This requires a browser for rendering, which is why headless browsers come into play. Here’s how to do it with Selenium:
- Install Selenium via NuGet.
- Create an instance of
ChromeDriver
with headless options. - Use XPath selectors for data extraction within the rendered web page.
Now we’ll start by creating a project called DynamicWebScraping
.
Step 1️⃣: Install Selenium
Install Selenium.WebDriver
in your new DynamicWebScraping
project.
Then, add these lines at the top of the App.cs
file:
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
Step 2️⃣: Connect to the Target Website
string url = "https://en.wikipedia.org/wiki/List_of_SpongeBob_SquarePants_episodes";
var chromeOptions = new ChromeOptions();
chromeOptions.AddArguments("headless");
var driver = new ChromeDriver(chromeOptions);
driver.Navigate().GoToUrl(url);
This sets up a Chrome WebDriver with headless mode enabled, then navigates to the Wikipedia page.
Step 3️⃣: Scrape Data From HTML Elements
You can use similar XPath to extract data. Note you need to use FindElements()
and FindElement()
, respectively for multiple elements and a single element.
using System;
using System.Collections.Generic;
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
namespace DynamicWebScraping {
public class Program {
public static void Main() {
string url = "https://en.wikipedia.org/wiki/List_of_SpongeBob_SquarePants_episodes";
var chromeOptions = new ChromeOptions();
chromeOptions.AddArguments("headless");
var driver = new ChromeDriver(chromeOptions);
driver.Navigate().GoToUrl(url);
var nodes = driver.FindElements(By.XPath("//*[@id='mw-content-text']/div[1]/table[position()>1 and position()<15]/tbody/tr[position()>1]"));
List<Episode> episodes = new();
foreach (var node in nodes)
{
episodes.Add(new Episode()
{
OverallNumber = node.FindElement(By.XPath("th[1]")).Text,
Title = node.FindElement(By.XPath("td[2]")).Text,
Directors = node.FindElement(By.XPath("td[3]")).Text,
WrittenBy = node.FindElement(By.XPath("td[4]")).Text,
Released = node.FindElement(By.XPath("td[5]")).Text
});
}
// ... convert data to csv, save data or call API ...
}
}
}
With Selenium, web pages will be rendered before extracting the data from them. With similar logic as before, this script populates the episodes list, ready to be processed further.
What To Do With the Scraped Data
After gathering data, consider these common actions:
- Storing in a database for easy querying.
- Converting to JSON for API integration.
- Exporting to CSV for easy sharing.
Data Privacy With Proxies
To avoid IP blocks and enhance anonymity during web scraping with C#, use proxies. ProxyTee provides residential proxies that can help you avoid bans and access geographically restricted content with these key benefits:
- Avoid IP bans: With a rotating residential proxies, target servers do not track your actual IP, they only track the IP of the proxy.
- Rotating IP Addresses: ProxyTee’s rotating IP feature changes IPs regularly from its massive IP pool, keeping your requests untraceable.
- Regional Scraping: You can select the location of the exit IP, and it enables you to perform global research and access localized information.