This page uses the Python tutorial here: https://docs.python.org/2/library/xml.etree.elementtree.html#module-xml.etree.ElementTree
5 Min XML drill for those who are not familiar with XML : http://www.diveintopython3.net/xml.html#xml-intro
5 Min XML drill for those who are not familiar with XML : http://www.diveintopython3.net/xml.html#xml-intro
<?xml version="1.0"?> <data> <country name="Liechtenstein"> <rank>1</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> </country> <country name="Singapore"> <rank>4</rank> <year>2011</year> <gdppc>59900</gdppc> <neighbor name="Malaysia" direction="N"/> </country> <country name="Panama"> <rank>68</rank> <year>2011</year> <gdppc>13600</gdppc> <neighbor name="Costa Rica" direction="W"/> <neighbor name="Colombia" direction="E"/> </country> </data>
There are several different Python XML parsers, we will be using the ElementTree
>>> import xml.etree.ElementTree as etree >>> tree = etree.parse('country_data.xml') >>> root=tree.getroot() >>> root.tag 'data' >>> root.attrib {} >>> for ch in root: ... print(ch.tag) ... country country country >>> for ch in root: ... print(ch.attrib) ... {'name': 'Liechtenstein'} {'name': 'Singapore'} {'name': 'Panama'} >>>
The elements can be accessed by the index
>>> root[0][1].text '2008' >>> root[2][2].text '13600' >>> root[2][2].tag 'gdppc'
You can iterate over elements by tag:
>>> for neighbor in root.iter('neighbor'): ... print(neighbor.attrib) ... {'direction': 'E', 'name': 'Austria'} {'direction': 'W', 'name': 'Switzerland'} {'direction': 'N', 'name': 'Malaysia'} {'direction': 'W', 'name': 'Costa Rica'} {'direction': 'E', 'name': 'Colombia'} >>> for cntry in root.iter('country'): ... print(cntry.attrib["name"]) ... Liechtenstein Singapore Panama
.iter method looks for the passed tag at the current level and the children recursively
>>> for cntry in root.findall("country"): ... print(cntry.find("rank").text) ... 1 4 68 >>> for cntry in root.findall("country"): ... print(cntry.get("name")) ... Liechtenstein Singapore Panama
Processing the XML pages from web
import urllib.request import xml.etree.ElementTree as etree page = urllib.request.urlopen("http://www.thomas-bayer.com/sqlrest/CUSTOMER/3/") content=page.read() content_string = content.decode("utf-8") root = etree.fromstring(content_string) for child in root: print(child.tag)
<CUSTOMER xmlns:xlink="http://www.w3.org/1999/xlink"> <ID>3</ID> <FIRSTNAME>Michael</FIRSTNAME> <LASTNAME>Clancy</LASTNAME> <STREET>542 Upland Pl.</STREET> <CITY>San Francisco</CITY> </CUSTOMER>
ID
FIRSTNAME
LASTNAME
STREET
CITY
FIRSTNAME
LASTNAME
STREET
CITY