How to Parse XML with Python

Parse XML with Python is a common task developers encounter when dealing with structured data from APIs, config files, or large data dumps. This guide dives deep into how to parse XML with Python using the built-in ElementTree module and XPath expressions. We will walk through real-world examples, explain each line of code, and cover practical scenarios such as searching nested tags, handling attributes, and working with namespaces. Whether you are building backend systems or working with web scraping and automation, parsing XML is a fundamental skill that ensures reliable data handling and manipulation.
What is XML and Why It Still Matters
Before learning how to parse XML with Python, it’s important to understand what XML is and why it remains relevant. XML, or eXtensible Markup Language, is a flexible text format used for structured data storage and exchange. Despite the rise of JSON, XML is still widely used in domains such as SOAP APIs, enterprise systems, document formats like RSS and SVG, and legacy data pipelines.
- XML is hierarchical and supports nested structures
- It includes attributes and namespaces for additional context
- It is both human-readable and machine-parseable
Python offers several libraries to work with XML, but for most needs, the built-in ElementTree module provides a simple and effective interface. When combined with XPath, it becomes a powerful solution for extracting precise pieces of data.
How to Parse XML with Python Using ElementTree
To parse XML with Python effectively, start by using the xml.etree.ElementTree module. It provides a tree structure that mirrors the hierarchical nature of XML. Below is a step-by-step example of how to parse a simple XML document, access elements, and print values.
<?xml version="1.0"?> <catalog> <book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer's Guide</title> <genre>Computer</genre> </book> <book id="bk102"> <author>Ralls, Kim</author> <title>Midnight Rain</title> <genre>Fantasy</genre> </book> </catalog>
Now let’s parse this data using ElementTree:
# Installation not required as xml.etree.ElementTree is built-in import xml.etree.ElementTree as ET tree = ET.parse('books.xml') # Load from a file root = tree.getroot() # Get the root element # Print all book titles for book in root.findall('book'): title = book.find('title').text print(title)
In this example, parse()
loads the XML file, and getroot()
accesses the root node. Then we use findall()
to iterate through all book elements and print their titles. This pattern is extremely useful when parsing large documents and performing data extraction.
How to Parse XML with Python from a String
In many use cases, such as fetching XML data from APIs, you will receive XML as a string. Here is how to parse XML with Python from a string instead of a file.
xml_data = '''<catalog> <book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer's Guide</title> </book> </catalog>''' root = ET.fromstring(xml_data) # Parse string directly # Extract and print author author = root.find('book/author').text print(author)
Using fromstring()
lets you convert raw XML content directly into a usable tree structure. This is perfect for web scraping, proxy-based data collection, or API integrations that return XML instead of JSON.
Advanced Filtering with XPath in ElementTree
One of the most powerful tools when learning to parse XML with Python is XPath. XPath is a query language used to navigate XML trees. Python’s ElementTree supports a limited XPath subset for performance reasons, but it covers most developer needs.
# Use XPath to find all book elements with genre='Fantasy' for book in root.findall(".//book[genre='Fantasy']"): title = book.find('title').text print(f"Fantasy book: {title}")
Here we use .//book[genre='Fantasy']
to find any book node with a child genre node containing ‘Fantasy’. This syntax is concise and avoids manually filtering data in Python after parsing. It is highly recommended when working with large XML trees or when querying nested elements.
Handling XML Attributes and Namespaces
XML attributes are parsed as dictionary values. Understanding how to access them is crucial when working with config files or data feeds. Namespaces, on the other hand, are used to avoid naming collisions in complex XML schemas.
# Accessing attributes for book in root.findall('book'): book_id = book.attrib['id'] print(f"Book ID: {book_id}")
To parse XML with Python when namespaces are present, you must define them explicitly in your queries:
# Sample XML with namespaces xml_ns = '''<ns:catalog xmlns:ns="http://example.com"> <ns:book id="bk103"> <ns:title>XML Fundamentals</ns:title> </ns:book> </ns:catalog>''' ns_root = ET.fromstring(xml_ns) ns = {'ns': 'http://example.com'} # Using namespace prefix in XPath title = ns_root.find('ns:book/ns:title', ns).text print(title)
This method allows full parsing of XML documents that follow modern schema-based structures. Ignoring namespaces often results in None
returns when using find()
or findall()
, so proper handling is critical in enterprise workflows.
Use Case: Parsing XML API Responses in Python
Many public APIs still return XML. Here’s a complete example of how to parse XML with Python in a web context:
import requests import xml.etree.ElementTree as ET response = requests.get('https://www.w3schools.com/xml/note.xml') xml_response = response.content root = ET.fromstring(xml_response) to = root.find('to').text from_ = root.find('from').text body = root.find('body').text print(f"Message from {from_} to {to}: {body}")
This use case illustrates a real-world backend integration where parsing XML is necessary. You can easily expand this logic to paginate through XML feeds or handle SOAP-based APIs using similar patterns.
When to Use Other Libraries Beyond ElementTree
While ElementTree is great for most XML tasks, more complex workflows may require:
- lxml for full XPath 1.0 and XSLT support
- minidom for fine-grained DOM manipulation
- xmltodict for converting XML to Python dictionaries
If you need strict schema validation or advanced transformations, consider integrating lxml. But for 80 percent of typical backend or automation use cases, ElementTree with XPath will meet all requirements efficiently.
What You Can Do Next with XML Parsing in Python
Now that you know how to parse XML with Python, you can apply it to a variety of developer tasks. Automate config readers, extract data from legacy APIs, clean up RSS feeds, or integrate with financial or government data services that still rely on XML. Mastery of XML parsing opens up integrations across many industries that have yet to fully transition to JSON.
Make sure to explore how XPath expressions can simplify complex searches, and consider combining parsed XML data with Pandas or SQL for deeper analysis. The techniques covered here will save you hours of manual parsing or error-prone string manipulation.
To keep building, try combining this XML parsing logic with other tools such as proxy support, request throttling, or multi-threaded processing for scraping large XML-based sites or feeds.
Explore, build, and refine your XML workflows with confidence knowing that Python gives you all the tools you need.