Affordable Rotating Residential Proxies with Unlimited Bandwidth
  • Products
  • Features
  • Pricing
  • Solutions
  • Blog

Contact sales

Give us a call or fill in the form below and we will contact you. We endeavor to answer all inquiries within 24 hours on business days. Or drop us a message at support@proxytee.com.

Edit Content



    Sign In
    Tutorial

    Getting Started with Web Scraping Using Python and Beautiful Soup

    April 30, 2025 Mike
    Web Scraping with Python: A Beautiful Soup Parsing Tutorial

    Web scraping, while complex, can be simplified using languages like Python, which offers user-friendly libraries. One such library is Beautiful Soup, designed for parsing HTML and XML documents. This tutorial explores how to use Beautiful Soup for parsing a sample HTML file, including navigating HTML tags, extracting content, finding elements by ID, extracting text, and exporting data to CSV. Let’s dive in, enhancing your understanding of data extraction with Python. If you need reliable Residential Proxies, ProxyTee has got you covered.


    Understanding Data Parsing and Beautiful Soup

    Data parsing is about converting data into a format that’s easily readable and analyzable. A parser filters and combines information based on specific criteria. ProxyTee ensures this process is seamless. Beautiful Soup is a Python package that creates a parse tree, enabling easy data extraction, navigation, and modification of HTML and XML, useful for web scraping, compatible with Python 3.6 and above. This helps streamline the data collection and parsing process.


    Installation and Setup

    Before using Beautiful Soup, ensure Python and an IDE such as PyCharm are set up. When installing Python, check the ‘PATH installation’ box, allowing your OS to recognize commands like pip and python easily. Install Beautiful Soup using this command in the terminal:

    pip install beautifulsoup4

    This tutorial will work with a basic sample HTML, ensure you’re familiar with its structure for effective data parsing.


    A Sample HTML Structure

    Here’s a simple HTML document we will use:

    <!DOCTYPE html>
    <html>
    	<head>
    		<title>What is a Proxy?</title>
    		<meta charset="utf-8">
    	</head>
    	<body>
    		<h2>Proxy types</h2>
    		<p>
    		  There are many different ways to categorize proxies.
    		 However, two of the most popular types are residential and data center proxies. Here is a list of the most common types.
    		</p>
    		<ul id="proxytypes">
    			<li>Residential proxies</li>
    			<li>Datacenter proxies</li>
    			<li>Shared proxies</li>
    			<li>Semi-dedicated proxies</li>
    			<li>Private proxies</li>
    		</ul>
    	</body>
    </html>

    Copy this to a text editor and save as ‘index.html’, or create a new HTML file in your IDE and copy/paste it. Now, you’re all set to explore and get some practice.


    Parsing HTML with Beautiful Soup

    1️⃣ Finding HTML Tags

    Use soup.descendants to extract a list of all tags in the document. The code below does the following:

    • Imports the Beautiful Soup Library.
    • Opens and reads the HTML document.
    • Creates a Beautiful Soup object to parse the document.
    • Iterates through the elements and prints the tag names.
    from bs4 import BeautifulSoup
    
    with open('index.html', 'r') as f:
        contents = f.read()
        soup = BeautifulSoup(contents, "html.parser")
    
    for child in soup.descendants:
        if child.name:
            print(child.name)
    

    This script will output:

    html
    head
    title
    meta
    body
    h2
    p
    ul
    li
    li
    li
    li
    li

    2️⃣ Extracting Full Content From HTML Tags

    You can directly extract tag contents using the tag name:

    from bs4 import BeautifulSoup
    
    with open('index.html', 'r') as f:
        contents = f.read()
        soup = BeautifulSoup(contents, "html.parser")
    
    print(soup.h2)
    print(soup.p)
    print(soup.li)

    Outputting the complete tag with the content:

    <h2>Proxy types</h2>
    <p>
              There are many different ways to categorize proxies.
              However, two  of the most popular types are residential and data center proxies.
     Here is a list  of the most common types.
            </p>
    <li>Residential proxies</li>

    To get only the text, add .text:

    print(soup.li.text)

    Result:

    Residential proxies

    3️⃣ Finding Elements by ID

    Find HTML elements using their IDs:

    print(soup.find('ul', attrs={'id': 'proxytypes'}))
    print(soup.find('ul', id='proxytypes'))

    Both of the above will yield the same result in your console:

    <ul id="proxytypes">
    <li>Residential proxies</li>
    <li>Datacenter proxies</li>
    <li>Shared proxies</li>
    <li>Semi-dedicated proxies</li>
    <li>Private proxies</li>
    </ul>

    4️⃣ Finding All Instances of a Tag

    To get all the tags on a specific item, use the find_all method:

    for tag in soup.find_all('li'):
        print(tag.text)

    The above will return all list items from the HTML:

    Residential proxies
    Datacenter proxies
    Shared proxies
    Semi-dedicated proxies
    Private proxies

    5️⃣ Parsing by CSS Selectors

    Beautiful Soup uses CSS selectors via the ‘select’ and ‘select_one’ methods:

    select_one: To get the first list item, use:

    print(soup.select_one('body ul li'))

    select: To select the title, you can use

    print(soup.select('html head title'))

    For extracting the third list item use the selector as follows:

    print(soup.select_one('body ul li:nth-of-type(3)'))

    Handling Dynamic Content

    For content loaded dynamically using JavaScript, other libraries are required as requests and Beautiful Soup alone aren’t enough. Use libraries such as Selenium to interact with the DOM:

    • Install Selenium:
    pip install selenium
    • Here’s the basic script:
    from selenium import webdriver
    from bs4 import BeautifulSoup
    
    driver = webdriver.Chrome()
    driver.get("http://quotes.toscrape.com/js/")
    js_content = driver.page_source
    soup = BeautifulSoup(js_content, "html.parser")
    quote = soup.find("span", class_="text")
    print(quote.text)

    This fetches the page from a dummy website with quotes and then selects the first quote using BeautifulSoup.

    To bypass bot detection consider using ProxyTee’s rotating residential proxies to mask your IP address.


    Exporting to a CSV file

    Exporting to a CSV file is crucial for further analysis. Use pandas to do this:

    • Install pandas using pip:
    pip install pandas

    Here is how to create CSV file for the proxy list in the HTML example used here:

    from bs4 import BeautifulSoup
    import pandas as pd
    
    with open('index.html', 'r') as f:
        contents = f.read()
        soup = BeautifulSoup(contents, "html.parser")
        results = soup.find_all('li')
        df = pd.DataFrame({'Names': results})
        df.to_csv('names.csv', index=False, encoding='utf-8')

    This creates a file named `names.csv` containing your extracted proxy list.

    • Beautiful Soup
    • Programming
    • Python
    • Web Scraping

    Post navigation

    Previous
    Next

    Table of Contents

    • Understanding Data Parsing and Beautiful Soup
    • Installation and Setup
    • A Sample HTML Structure
    • Parsing HTML with Beautiful Soup
    • Handling Dynamic Content
    • Exporting to a CSV file

    Categories

    • Comparison & Differences (25)
    • Cybersecurity (5)
    • Datacenter Proxies (2)
    • Digital Marketing & Data Analytics (1)
    • Exploring (67)
    • Guide (1)
    • Mobile Proxies (2)
    • Residental Proxies (4)
    • Rotating Proxies (3)
    • Tutorial (52)
    • Uncategorized (1)
    • Web Scraping (3)

    Recent posts

    • Types of Proxies Explained: Mastering 3 Key Categories
      Types of Proxies Explained: Mastering 3 Key Categories
    • What is MAP Monitoring and Why It’s Crucial for Your Brand?
      What is MAP Monitoring and Why It’s Crucial for Your Brand?
    • earning with proxytee, affiliate, reseller, unlimited bandwidth, types of proxies, unlimited residential proxy, contact, press-kit
      Unlock Peak Performance with an Unlimited Residential Proxy
    • Web Scraping with lxml: A Guide Using ProxyTee
      Web Scraping with lxml: A Guide Using ProxyTee
    • How to Scrape Yelp Data with ProxyTee
      How to Scrape Yelp Data for Local Business Insights

    Related Posts

    Web Scraping with lxml: A Guide Using ProxyTee
    Tutorial

    Web Scraping with lxml: A Guide Using ProxyTee

    May 12, 2025 Mike

    Web scraping is an automated process of collecting data from websites, which is essential for many purposes, such as data analysis and training AI models. Python is a popular language for web scraping, and lxml is a robust library for parsing HTML and XML documents. In this post, we’ll explore how to leverage lxml for web […]

    How to Scrape Yelp Data with ProxyTee
    Tutorial

    How to Scrape Yelp Data for Local Business Insights

    May 10, 2025 Mike

    Scraping Yelp data can open up a world of insights for marketers, developers, and SEO professionals. Whether you’re conducting market research, generating leads, or monitoring local business trends, having access to structured Yelp data is invaluable. In this article, we’ll walk you through how to scrape Yelp data safely and effectively. You’ll discover real use […]

    Understanding Data Extraction with ProxyTee
    Exploring

    Understanding Data Extraction with ProxyTee

    May 9, 2025 Mike

    Data extraction is a cornerstone for many modern businesses, spanning various sectors from finance to e-commerce. Effective data extraction tools are crucial for automating tasks, saving time, resources, and money. This post delves into the essentials of data extraction, covering its uses, methods, and challenges, and explores how ProxyTee can enhance this process with its […]

    We help ambitious businesses achieve more

    Free consultation
    Contact sales
    • Sign In
    • Sign Up
    • Contact
    • Facebook
    • Twitter
    • Telegram
    Affordable Rotating Residential Proxies with Unlimited Bandwidth

    Get reliable, affordable rotating proxies with unlimited bandwidth for seamless browsing and enhanced security.

    Products
    • Features
    • Pricing
    • Solutions
    • Testimonials
    • FAQs
    • Partners
    Tools
    • App
    • API
    • Blog
    • Check Proxies
    • Free Proxies
    Legal
    • Privacy Policy
    • Terms of Use
    • Affiliate
    • Reseller
    • White-label
    Support
    • Contact
    • Support Center
    • Knowlegde Base

    Copyright © 2025 ProxyTee