19.7. xml.etree.ElementTree — The ElementTree XML API@H_301_2@
Source code: Lib/xml/etree/ElementTree.py@H_301_2@
The Element type is a flexible container object,designed to store hierarchical data structures in memory. The type can be described as a cross between a list and a dictionary.@H_301_2@
Warning@H_301_2@
The xml.etree.ElementTree module is not secure against malicIoUsly constructed data. If you need to parse untrusted or unauthenticated data see XML vulnerabilities.@H_301_2@
Each element has a number of properties associated with it:@H_301_2@
- a tag which is a string identifying what kind of data this element represents (the element type,in other words). @H_301_2@
- a number of attributes,stored in a Python dictionary. @H_301_2@
- a text string. @H_301_2@
- an optional tail string. @H_301_2@
- a number of child elements,stored in a Python sequence @H_301_2@
To create an element instance,use the Element constructor or the SubElement() factory function.@H_301_2@
The ElementTree class can be used to wrap an element structure,and convert it from and to XML.@H_301_2@
A C implementation of this API is available as xml.etree.cElementTree@H_301_2@.@H_301_2@
See http://effbot.org/zone/element-index.htm for tutorials and links to other docs. Fredrik Lundh’s page is also the location of the development version of the xml.etree.ElementTree.@H_301_2@
Changed in version 2.7: @H_301_2@The ElementTree API is updated to 1.3. For more information,see Introducing ElementTree 1.3.@H_301_2@
19.7.1. Tutorial@H_301_2@
This is a short tutorial for using xml.etree.ElementTree (ET in short). The goal is to demonstrate some of the building blocks and basic concepts of the module.@H_301_2@
19.7.1.1. XML tree and elements@H_301_2@
XML is an inherently hierarchical data format,and the most natural way to represent it is with a tree. ET has two classes for this purpose - ElementTree represents the whole XML document as a tree,and Element represents a single node in this tree. Interactions with the whole document (reading and writing to/from files) are usually done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.@H_301_2@
19.7.1.2. Parsing XML@H_301_2@
We’ll be using the following XML document as the sample data for this section:@H_301_2@
<?xml version="1.0"?>@H_301_2@
<data>@H_301_2@
<country@H_301_2@ name=@H_301_2@"Liechtenstein"@H_301_2@>@H_301_2@
<rank>@H_301_2@1</rank>@H_301_2@
<year>@H_301_2@2008</year>@H_301_2@
<gdppc>@H_301_2@141100</gdppc>@H_301_2@
<neighbor@H_301_2@ name=@H_301_2@"Austria"@H_301_2@ direction=@H_301_2@"E"@H_301_2@/>@H_301_2@
<neighbor@H_301_2@ name=@H_301_2@"Switzerland"@H_301_2@ direction=@H_301_2@"W"@H_301_2@/>@H_301_2@
</country>@H_301_2@
<country@H_301_2@ name=@H_301_2@"Singapore"@H_301_2@>@H_301_2@
<rank>@H_301_2@4</rank>@H_301_2@
<year>@H_301_2@2011</year>@H_301_2@
<gdppc>@H_301_2@59900</gdppc>@H_301_2@
<neighbor@H_301_2@ name=@H_301_2@"Malaysia"@H_301_2@ direction=@H_301_2@"N"@H_301_2@/>@H_301_2@
</country>@H_301_2@
<country@H_301_2@ name=@H_301_2@"Panama"@H_301_2@>@H_301_2@
<rank>@H_301_2@68</rank>@H_301_2@
<year>@H_301_2@2011</year>@H_301_2@
<gdppc>@H_301_2@13600</gdppc>@H_301_2@
<neighbor@H_301_2@ name=@H_301_2@"Costa Rica"@H_301_2@ direction=@H_301_2@"W"@H_301_2@/>@H_301_2@
<neighbor@H_301_2@ name=@H_301_2@"Colombia"@H_301_2@ direction=@H_301_2@"E"@H_301_2@/>@H_301_2@
</country>@H_301_2@
</data>@H_301_2@
@H_301_2@
We have a number of ways to import the data. Reading the file from disk:@H_301_2@
Reading the data from a string:@H_301_2@
fromstring() parses XML from a string directly into an Element,which is the root element of the parsed tree. Other parsing functions may create an ElementTree. Check the documentation to be sure.@H_301_2@
As an Element,root has a tag and a dictionary of attributes:@H_301_2@
It also has children nodes over which we can iterate:@H_301_2@
>>> @H_301_2@for@H_301_2@ child@H_301_2@ in@H_301_2@ root@H_301_2@:@H_301_2@
... @H_301_2@ print@H_301_2@ child@H_301_2@.@H_301_2@tag@H_301_2@,@H_301_2@ child@H_301_2@.@H_301_2@attrib@H_301_2@
...@H_301_2@
country {'name': 'Liechtenstein'}@H_301_2@
country {'name': 'Singapore'}@H_301_2@
country {'name': 'Panama'}@H_301_2@
@H_301_2@
Children are nested,and we can access specific child nodes by index:@H_301_2@
19.7.1.3. Finding interesting elements@H_301_2@
Element has some useful methods that help iterate recursively over all the sub-tree below it (its children,their children,and so on). For example, Element.iter():@H_301_2@
>>> @H_301_2@for@H_301_2@ neighbor@H_301_2@ in@H_301_2@ root@H_301_2@.@H_301_2@iter@H_301_2@(@H_301_2@'neighbor'@H_301_2@):@H_301_2@
... @H_301_2@ print@H_301_2@ neighbor@H_301_2@.@H_301_2@attrib@H_301_2@
...@H_301_2@
{'name': 'Austria','direction': 'E'}@H_301_2@
{'name': 'Switzerland','direction': 'W'}@H_301_2@
{'name': 'Malaysia','direction': 'N'}@H_301_2@
{'name': 'Costa Rica','direction': 'W'}@H_301_2@
{'name': 'Colombia','direction': 'E'}@H_301_2@
@H_301_2@
Element.findall() finds only elements with a tag which are direct children of the current element. Element.find() finds the first child with a particular tag,and Element.text accesses the element’s text content. Element.get() accesses the element’s attributes:@H_301_2@
>>> @H_301_2@for@H_301_2@ country@H_301_2@ in@H_301_2@ root@H_301_2@.@H_301_2@findall@H_301_2@(@H_301_2@'country'@H_301_2@):@H_301_2@
... @H_301_2@ rank@H_301_2@ =@H_301_2@ country@H_301_2@.@H_301_2@find@H_301_2@(@H_301_2@'rank'@H_301_2@)@H_301_2@.@H_301_2@text@H_301_2@
... @H_301_2@ name@H_301_2@ =@H_301_2@ country@H_301_2@.@H_301_2@get@H_301_2@(@H_301_2@'name'@H_301_2@)@H_301_2@
... @H_301_2@ print@H_301_2@ name@H_301_2@,@H_301_2@ rank@H_301_2@
...@H_301_2@
Liechtenstein 1@H_301_2@
Singapore 4@H_301_2@
Panama 68@H_301_2@
@H_301_2@
More sophisticated specification of which elements to look for is possible by using XPath.@H_301_2@
19.7.1.4. Modifying an XML File@H_301_2@
ElementTree provides a simple way to build XML documents and write them to files. The ElementTree.write() method serves this purpose.@H_301_2@
Once created,an Element object may be manipulated by directly changing its fields (such as Element.text),adding and modifying attributes (Element.set() method),as well as adding new children (for example with Element.append()).@H_301_2@
Let’s say we want to add one to each country’s rank,and add an updated attribute to the rank element:@H_301_2@
>>> @H_301_2@for@H_301_2@ rank@H_301_2@ in@H_301_2@ root@H_301_2@.@H_301_2@iter@H_301_2@(@H_301_2@'rank'@H_301_2@):@H_301_2@
... @H_301_2@ new_rank@H_301_2@ =@H_301_2@ int@H_301_2@(@H_301_2@rank@H_301_2@.@H_301_2@text@H_301_2@)@H_301_2@ +@H_301_2@ 1@H_301_2@
... @H_301_2@ rank@H_301_2@.@H_301_2@text@H_301_2@ =@H_301_2@ str@H_301_2@(@H_301_2@new_rank@H_301_2@)@H_301_2@
... @H_301_2@ rank@H_301_2@.@H_301_2@set@H_301_2@(@H_301_2@'updated'@H_301_2@,@H_301_2@ 'yes'@H_301_2@)@H_301_2@
...@H_301_2@
>>> @H_301_2@tree@H_301_2@.@H_301_2@write@H_301_2@(@H_301_2@'output.xml'@H_301_2@)@H_301_2@
@H_301_2@
Our XML now looks like this:@H_301_2@
<?xml version="1.0"?>@H_301_2@
<data>@H_301_2@
<country@H_301_2@ name=@H_301_2@"Liechtenstein"@H_301_2@>@H_301_2@
<rank@H_301_2@ updated=@H_301_2@"yes"@H_301_2@>@H_301_2@2</rank>@H_301_2@
<year>@H_301_2@2008</year>@H_301_2@
<gdppc>@H_301_2@141100</gdppc>@H_301_2@
<neighbor@H_301_2@ name=@H_301_2@"Austria"@H_301_2@ direction=@H_301_2@"E"@H_301_2@/>@H_301_2@
<neighbor@H_301_2@ name=@H_301_2@"Switzerland"@H_301_2@ direction=@H_301_2@"W"@H_301_2@/>@H_301_2@
</country>@H_301_2@
<country@H_301_2@ name=@H_301_2@"Singapore"@H_301_2@>@H_301_2@
<rank@H_301_2@ updated=@H_301_2@"yes"@H_301_2@>@H_301_2@5</rank>@H_301_2@
<year>@H_301_2@2011</year>@H_301_2@
<gdppc>@H_301_2@59900</gdppc>@H_301_2@
<neighbor@H_301_2@ name=@H_301_2@"Malaysia"@H_301_2@ direction=@H_301_2@"N"@H_301_2@/>@H_301_2@
</country>@H_301_2@
<country@H_301_2@ name=@H_301_2@"Panama"@H_301_2@>@H_301_2@
<rank@H_301_2@ updated=@H_301_2@"yes"@H_301_2@>@H_301_2@69</rank>@H_301_2@
<year>@H_301_2@2011</year>@H_301_2@
<gdppc>@H_301_2@13600</gdppc>@H_301_2@
<neighbor@H_301_2@ name=@H_301_2@"Costa Rica"@H_301_2@ direction=@H_301_2@"W"@H_301_2@/>@H_301_2@
<neighbor@H_301_2@ name=@H_301_2@"Colombia"@H_301_2@ direction=@H_301_2@"E"@H_301_2@/>@H_301_2@
</country>@H_301_2@
</data>@H_301_2@
@H_301_2@
We can remove elements using Element.remove(). Let’s say we want to remove all countries with a rank higher than 50:@H_301_2@
>>> @H_301_2@for@H_301_2@ country@H_301_2@ in@H_301_2@ root@H_301_2@.@H_301_2@findall@H_301_2@(@H_301_2@'country'@H_301_2@):@H_301_2@
... @H_301_2@ rank@H_301_2@ =@H_301_2@ int@H_301_2@(@H_301_2@country@H_301_2@.@H_301_2@find@H_301_2@(@H_301_2@'rank'@H_301_2@)@H_301_2@.@H_301_2@text@H_301_2@)@H_301_2@
... @H_301_2@ if@H_301_2@ rank@H_301_2@ >@H_301_2@ 50@H_301_2@:@H_301_2@
... @H_301_2@ root@H_301_2@.@H_301_2@remove@H_301_2@(@H_301_2@country@H_301_2@)@H_301_2@
...@H_301_2@
>>> @H_301_2@tree@H_301_2@.@H_301_2@write@H_301_2@(@H_301_2@'output.xml'@H_301_2@)@H_301_2@
@H_301_2@
Our XML now looks like this:@H_301_2@
<?xml version="1.0"?>@H_301_2@
<data>@H_301_2@
<country@H_301_2@ name=@H_301_2@"Liechtenstein"@H_301_2@>@H_301_2@
<rank@H_301_2@ updated=@H_301_2@"yes"@H_301_2@>@H_301_2@2</rank>@H_301_2@
<year>@H_301_2@2008</year>@H_301_2@
<gdppc>@H_301_2@141100</gdppc>@H_301_2@
<neighbor@H_301_2@ name=@H_301_2@"Austria"@H_301_2@ direction=@H_301_2@"E"@H_301_2@/>@H_301_2@
<neighbor@H_301_2@ name=@H_301_2@"Switzerland"@H_301_2@ direction=@H_301_2@"W"@H_301_2@/>@H_301_2@
</country>@H_301_2@
<country@H_301_2@ name=@H_301_2@"Singapore"@H_301_2@>@H_301_2@
<rank@H_301_2@ updated=@H_301_2@"yes"@H_301_2@>@H_301_2@5</rank>@H_301_2@
<year>@H_301_2@2011</year>@H_301_2@
<gdppc>@H_301_2@59900</gdppc>@H_301_2@
<neighbor@H_301_2@ name=@H_301_2@"Malaysia"@H_301_2@ direction=@H_301_2@"N"@H_301_2@/>@H_301_2@
</country>@H_301_2@
</data>@H_301_2@
@H_301_2@
19.7.1.5. Building XML documents@H_301_2@
The SubElement() function also provides a convenient way to create new sub-elements for a given element:@H_301_2@
>>> @H_301_2@a@H_301_2@ =@H_301_2@ ET@H_301_2@.@H_301_2@Element@H_301_2@(@H_301_2@'a'@H_301_2@)@H_301_2@
>>> @H_301_2@b@H_301_2@ =@H_301_2@ ET@H_301_2@.@H_301_2@SubElement@H_301_2@(@H_301_2@a@H_301_2@,@H_301_2@ 'b'@H_301_2@)@H_301_2@
>>> @H_301_2@c@H_301_2@ =@H_301_2@ ET@H_301_2@.@H_301_2@SubElement@H_301_2@(@H_301_2@a@H_301_2@,@H_301_2@ 'c'@H_301_2@)@H_301_2@
>>> @H_301_2@d@H_301_2@ =@H_301_2@ ET@H_301_2@.@H_301_2@SubElement@H_301_2@(@H_301_2@c@H_301_2@,@H_301_2@ 'd'@H_301_2@)@H_301_2@
>>> @H_301_2@ET@H_301_2@.@H_301_2@dump@H_301_2@(@H_301_2@a@H_301_2@)@H_301_2@
<a><b /><c><d /></c></a>@H_301_2@
@H_301_2@
19.7.1.6. Parsing XML with Namespaces@H_301_2@
If the XML input has namespaces,tags and attributes with prefixes in the form prefix:soMetag get expanded to {uri}soMetag where the prefix is replaced by the full URI. Also,if there is a default namespace,that full URI gets prepended to all of the non-prefixed tags.@H_301_2@
Here is an XML example that incorporates two namespaces,one with the prefix “fictional” and the other serving as the default namespace:@H_301_2@
<?xml version="1.0"?>@H_301_2@
<actors@H_301_2@ xmlns:fictional=@H_301_2@"http://characters.example.com"@H_301_2@
xmlns=@H_301_2@"http://people.example.com"@H_301_2@>@H_301_2@
<actor>@H_301_2@
<name>@H_301_2@John Cleese</name>@H_301_2@
<fictional:character>@H_301_2@Lancelot</fictional:character>@H_301_2@
<fictional:character>@H_301_2@Archie Leach</fictional:character>@H_301_2@
</actor>@H_301_2@
<actor>@H_301_2@
<name>@H_301_2@Eric Idle</name>@H_301_2@
<fictional:character>@H_301_2@Sir Robin</fictional:character>@H_301_2@
<fictional:character>@H_301_2@Gunther</fictional:character>@H_301_2@
<fictional:character>@H_301_2@Commander Clement</fictional:character>@H_301_2@
</actor>@H_301_2@
</actors>@H_301_2@
@H_301_2@
One way to search and explore this XML example is to manually add the URI to every tag or attribute in the xpath of a find() or findall():@H_301_2@
root@H_301_2@ =@H_301_2@ fromstring@H_301_2@(@H_301_2@xml_text@H_301_2@)@H_301_2@
for@H_301_2@ actor@H_301_2@ in@H_301_2@ root@H_301_2@.@H_301_2@findall@H_301_2@(@H_301_2@'{http://people.example.com}actor'@H_301_2@):@H_301_2@
name@H_301_2@ =@H_301_2@ actor@H_301_2@.@H_301_2@find@H_301_2@(@H_301_2@'{http://people.example.com}name'@H_301_2@)@H_301_2@
print@H_301_2@ name@H_301_2@.@H_301_2@text@H_301_2@
for@H_301_2@ char@H_301_2@ in@H_301_2@ actor@H_301_2@.@H_301_2@findall@H_301_2@(@H_301_2@'{http://characters.example.com}character'@H_301_2@):@H_301_2@
print@H_301_2@ ' |-->'@H_301_2@,@H_301_2@ char@H_301_2@.@H_301_2@text@H_301_2@
@H_301_2@
A better way to search the namespaced XML example is to create a dictionary with your own prefixes and use those in the search functions:@H_301_2@
ns@H_301_2@ =@H_301_2@ {@H_301_2@'real_person'@H_301_2@:@H_301_2@ 'http://people.example.com'@H_301_2@,@H_301_2@
'role'@H_301_2@:@H_301_2@ 'http://characters.example.com'@H_301_2@}@H_301_2@
for@H_301_2@ actor@H_301_2@ in@H_301_2@ root@H_301_2@.@H_301_2@findall@H_301_2@(@H_301_2@'real_person:actor'@H_301_2@,@H_301_2@ ns@H_301_2@):@H_301_2@
name@H_301_2@ =@H_301_2@ actor@H_301_2@.@H_301_2@find@H_301_2@(@H_301_2@'real_person:name'@H_301_2@,@H_301_2@ ns@H_301_2@)@H_301_2@
print@H_301_2@ name@H_301_2@.@H_301_2@text@H_301_2@
for@H_301_2@ char@H_301_2@ in@H_301_2@ actor@H_301_2@.@H_301_2@findall@H_301_2@(@H_301_2@'role:character'@H_301_2@,@H_301_2@ ns@H_301_2@):@H_301_2@
print@H_301_2@ ' |-->'@H_301_2@,@H_301_2@ char@H_301_2@.@H_301_2@text@H_301_2@
@H_301_2@
These two approaches both output:@H_301_2@
John Cleese
|--> Lancelot
|--> Archie Leach
Eric Idle
|--> Sir Robin
|--> Gunther
|--> Commander Clement
@H_301_2@
19.7.1.7. Additional resources@H_301_2@
See http://effbot.org/zone/element-index.htm for tutorials and links to other docs.@H_301_2@