Tutorial

How to Parse XML with Python

April 22, 2025 Mike

Parse XML with Python is a common task developers encounter when dealing with structured data from APIs, config files, or large data dumps. This guide dives deep into how to parse XML with Python using the built-in ElementTree module and XPath expressions. We will walk through real-world examples, explain each line of code, and cover practical scenarios such as searching nested tags, handling attributes, and working with namespaces. Whether you are building backend systems or working with web scraping and automation, parsing XML is a fundamental skill that ensures reliable data handling and manipulation.

What is XML and Why It Still Matters

Before learning how to parse XML with Python, it’s important to understand what XML is and why it remains relevant. XML, or eXtensible Markup Language, is a flexible text format used for structured data storage and exchange. Despite the rise of JSON, XML is still widely used in domains such as SOAP APIs, enterprise systems, document formats like RSS and SVG, and legacy data pipelines.

XML is hierarchical and supports nested structures
It includes attributes and namespaces for additional context
It is both human-readable and machine-parseable

Python offers several libraries to work with XML, but for most needs, the built-in ElementTree module provides a simple and effective interface. When combined with XPath, it becomes a powerful solution for extracting precise pieces of data.

How to Parse XML with Python Using ElementTree

To parse XML with Python effectively, start by using the xml.etree.ElementTree module. It provides a tree structure that mirrors the hierarchical nature of XML. Below is a step-by-step example of how to parse a simple XML document, access elements, and print values.

<?xml version="1.0"?>
<catalog>
  <book id="bk101">
    <author>Gambardella, Matthew</author>
    <title>XML Developer's Guide</title>
    <genre>Computer</genre>
  </book>
  <book id="bk102">
    <author>Ralls, Kim</author>
    <title>Midnight Rain</title>
    <genre>Fantasy</genre>
  </book>
</catalog>

Now let’s parse this data using ElementTree:

# Installation not required as xml.etree.ElementTree is built-in
import xml.etree.ElementTree as ET

tree = ET.parse('books.xml')  # Load from a file
root = tree.getroot()         # Get the root element

# Print all book titles
for book in root.findall('book'):
    title = book.find('title').text
    print(title)

In this example, parse() loads the XML file, and getroot() accesses the root node. Then we use findall() to iterate through all book elements and print their titles. This pattern is extremely useful when parsing large documents and performing data extraction.

How to Parse XML with Python from a String

In many use cases, such as fetching XML data from APIs, you will receive XML as a string. Here is how to parse XML with Python from a string instead of a file.

xml_data = '''<catalog> <book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer's Guide</title> </book> </catalog>'''

root = ET.fromstring(xml_data)  # Parse string directly

# Extract and print author
author = root.find('book/author').text
print(author)

Using fromstring() lets you convert raw XML content directly into a usable tree structure. This is perfect for web scraping, proxy-based data collection, or API integrations that return XML instead of JSON.

Advanced Filtering with XPath in ElementTree

One of the most powerful tools when learning to parse XML with Python is XPath. XPath is a query language used to navigate XML trees. Python’s ElementTree supports a limited XPath subset for performance reasons, but it covers most developer needs.

# Use XPath to find all book elements with genre='Fantasy'
for book in root.findall(".//book[genre='Fantasy']"):
    title = book.find('title').text
    print(f"Fantasy book: {title}")

Here we use .//book[genre='Fantasy'] to find any book node with a child genre node containing ‘Fantasy’. This syntax is concise and avoids manually filtering data in Python after parsing. It is highly recommended when working with large XML trees or when querying nested elements.

Handling XML Attributes and Namespaces

XML attributes are parsed as dictionary values. Understanding how to access them is crucial when working with config files or data feeds. Namespaces, on the other hand, are used to avoid naming collisions in complex XML schemas.

# Accessing attributes
for book in root.findall('book'):
    book_id = book.attrib['id']
    print(f"Book ID: {book_id}")

To parse XML with Python when namespaces are present, you must define them explicitly in your queries:

# Sample XML with namespaces
xml_ns = '''<ns:catalog xmlns:ns="http://example.com">
    <ns:book id="bk103">
        <ns:title>XML Fundamentals</ns:title>
    </ns:book>
</ns:catalog>'''
ns_root = ET.fromstring(xml_ns)
ns = {'ns': 'http://example.com'}  # Using namespace prefix in XPath
title = ns_root.find('ns:book/ns:title', ns).text
print(title)

This method allows full parsing of XML documents that follow modern schema-based structures. Ignoring namespaces often results in None returns when using find() or findall(), so proper handling is critical in enterprise workflows.

Use Case: Parsing XML API Responses in Python

Many public APIs still return XML. Here’s a complete example of how to parse XML with Python in a web context:

import requests
import xml.etree.ElementTree as ET

response = requests.get('https://www.w3schools.com/xml/note.xml')
xml_response = response.content
root = ET.fromstring(xml_response)

to = root.find('to').text
from_ = root.find('from').text
body = root.find('body').text

print(f"Message from {from_} to {to}: {body}")

This use case illustrates a real-world backend integration where parsing XML is necessary. You can easily expand this logic to paginate through XML feeds or handle SOAP-based APIs using similar patterns.

When to Use Other Libraries Beyond ElementTree

While ElementTree is great for most XML tasks, more complex workflows may require:

lxml for full XPath 1.0 and XSLT support
minidom for fine-grained DOM manipulation
xmltodict for converting XML to Python dictionaries

If you need strict schema validation or advanced transformations, consider integrating lxml. But for 80 percent of typical backend or automation use cases, ElementTree with XPath will meet all requirements efficiently.

What You Can Do Next with XML Parsing in Python

Now that you know how to parse XML with Python, you can apply it to a variety of developer tasks. Automate config readers, extract data from legacy APIs, clean up RSS feeds, or integrate with financial or government data services that still rely on XML. Mastery of XML parsing opens up integrations across many industries that have yet to fully transition to JSON.

Make sure to explore how XPath expressions can simplify complex searches, and consider combining parsed XML data with Pandas or SQL for deeper analysis. The techniques covered here will save you hours of manual parsing or error-prone string manipulation.

To keep building, try combining this XML parsing logic with other tools such as proxy support, request throttling, or multi-threaded processing for scraping large XML-based sites or feeds.

Explore, build, and refine your XML workflows with confidence knowing that Python gives you all the tools you need.

How to Parse XML with Python

What is XML and Why It Still Matters

How to Parse XML with Python Using ElementTree

How to Parse XML with Python from a String

Advanced Filtering with XPath in ElementTree

Handling XML Attributes and Namespaces

Use Case: Parsing XML API Responses in Python

When to Use Other Libraries Beyond ElementTree

What You Can Do Next with XML Parsing in Python

We help ambitious businesses achieve more

Products

Tools

Legal

Support

Contact sales

How to Parse XML with Python

What is XML and Why It Still Matters

How to Parse XML with Python Using ElementTree

How to Parse XML with Python from a String

Advanced Filtering with XPath in ElementTree

Handling XML Attributes and Namespaces

Use Case: Parsing XML API Responses in Python

When to Use Other Libraries Beyond ElementTree

What You Can Do Next with XML Parsing in Python

Related Posts

Beginner’s Guide to Web Crawling with Python and Scrapy

Learn Web Scraping with Beautiful Soup

How to Scrape Yelp Data in Python for Local Business Insights

We help ambitious businesses achieve more

Products

Tools

Legal

Support