19.7. xml.etree.ElementTree — The ElementTree XML API
New in version 2.5.
Source code: Lib/xml/etree/ElementTree.py
The Element type is a flexible container object,designed to store hierarchical data structures in memory. The type can be described as a cross between a list and a dictionary.
Warning
The xml.etree.ElementTree module is not secure against malicIoUsly constructed data. If you need to parse untrusted or unauthenticated data see XML vulnerabilities.
Each element has a number of properties associated with it:
- a tag which is a string identifying what kind of data this element represents (the element type,in other words).
- a number of attributes,stored in a Python dictionary.
- a text string.
- an optional tail string.
- a number of child elements,stored in a Python sequence
To create an element instance,use the Element constructor or the SubElement() factory function.
The ElementTree class can be used to wrap an element structure,and convert it from and to XML.
A C implementation of this API is available as xml.etree.cElementTree.
See http://effbot.org/zone/element-index.htm for tutorials and links to other docs. Fredrik Lundh’s page is also the location of the development version of the xml.etree.ElementTree.
Changed in version 2.7: The ElementTree API is updated to 1.3. For more information,see Introducing ElementTree 1.3.
19.7.1. Tutorial
This is a short tutorial for using xml.etree.ElementTree (ET in short). The goal is to demonstrate some of the building blocks and basic concepts of the module.
19.7.1.1. XML tree and elements
XML is an inherently hierarchical data format,and the most natural way to represent it is with a tree. ET has two classes for this purpose - ElementTree represents the whole XML document as a tree,and Element represents a single node in this tree. Interactions with the whole document (reading and writing to/from files) are usually done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.
19.7.1.2. Parsing XML
We’ll be using the following XML document as the sample data for this section:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
We have a number of ways to import the data. Reading the file from disk:
@H_502_264@import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
Reading the data from a string:
@H_502_264@root = ET.fromstring(country_data_as_string)
fromstring() parses XML from a string directly into an Element,which is the root element of the parsed tree. Other parsing functions may create an ElementTree. Check the documentation to be sure.
As an Element,root has a tag and a dictionary of attributes:
@H_502_264@>>> root.tag
'data'
>>> root.attrib
{}
It also has children nodes over which we can iterate:
@H_502_264@>>> for child in root:
... print child.tag, child.attrib
...
country {'name': 'Liechtenstein'}
country {'name': 'Singapore'}
country {'name': 'Panama'}