Client access to HTTP requests is provided by the httpclient module, although the higher-level urllib package s modules, urllibparse, urllibrequest, urllibresponse, urlliberror, and urllibrobotparser, provide easier and more convenient access to URLs Grabbing a le from the Internet is as simple as:
fh = urllibrequesturlopen("http://wwwpythonorg/indexhtml") html = fhread()decode("utf8")
The urllibrequesturlopen() function returns an object that behaves much like a le object opened in read binary mode Here we retrieve the Python Web site s indexhtml le (as a bytes object), and store it as a string in the html variable It is also possible to grab les and store them in local les with the urllibrequesturlretrieve() function HTML and XHTML documents can be parsed using the htmlparser module, URLs can be parsed and created using the urllibparse module, and robotstxt les can be parsed with the urllibrobotparser module Data that is represented using JSON (JavaScript Object Notation) can be read and written using the json module In addition to HTTP server and client support, the library provides XML-RPC (Remote Procedure Call) support with the xmlrpcclient and xmlrpcserver modules Additional client functionality is provided for FTP (File Transfer Protocol) by the ftplib module, for NNTP (Network News Transfer Protocol) by the nntplib module, and for TELNET with the telnetlib module The smtpd module provides an SMTP (Simple Mail Transfer Protocol) server, and the email client modules are smtplib for SMTP, imaplib for IMAP4 (Internet Message Access Protocol), and poplib for POP3 (Post Of ce Protocol) Mailboxes in various formats can be accessed using the mailbox module Individual messages (including multipart messages) can be created and manipulated using the email module If the standard library s packages and modules are insuf cient in this area, Twisted (wwwtwistedmatrixcom) provides a comprehensive third-party networking library Many third-party web programming libraries are also available, including Django (wwwdjangoprojectcom) and Turbogears (wwwturbogearsorg) for creating web applications, and Plone (wwwploneorg) and Zope (wwwzopeorg) which provide complete web frameworks and content management systems All of these libraries are written in Python
There are two widely used approaches to parsing XML documents One is the DOM (Document Object Model) and the other is SAX (Simple API for XML) Two DOM parsers are provided, one by the xmldom module and the other by the xmldomminidom module A SAX parser is provided by the xmlsax mod-
Overview of Python s Standard Library
ule We have already used the xmlsaxsaxutils module for its xmlsaxsaxutilsescape() function (to XML-escape & , < , and > ) There is also an xmlsaxsaxutilsquoteattr() function that does the same thing but additionally escapes quotes (to make the text suitable for a tag s attribute), and xmlsaxsaxutilsunescape() to do the opposite conversion Two other parsers are available The xmlparsersexpat module can be used to parse XML documents with expat, providing the expat library is available, and the xmletreeElementTree can be used to parse XML documents using a kind of dictionary/list interface (By default, the DOM and element tree parsers themselves use the expat parser under the hood) Writing XML manually and writing XML using DOM and element trees, and parsing XML using the DOM, SAX, and element tree parsers, is covered in 7 There is also a third-party library, lxml (wwwcodespeaknet/lxml), that claims to be the most feature-rich and easy-to-use library for working with XML and HTML in the Python language This library provides an interface that is essentially a superset of what the element tree module provides, as well as many additional features such as support for XPath, XSLT, and many other XML technologies
Example: The xmletreeElementTree Module
Python s DOM and SAX parsers provide the APIs that experienced XML programmers are used to, and the xmletreeElementTree module offers a more Pythonic approach to parsing and writing XML The element tree module is a fairly recent addition to the standard library, and so may not be familiar to some readers In view of this, we will present a very short example here to give a avor of it 7 provides a more substantial example and provides comparative code using DOM and SAX The US government s NOAA (National Oceanic and Atmospheric Administration) Web site provides a wide variety of data, including an XML le that lists the US weather stations The le is more than 20 000 lines long and contains details of around two thousand stations Here is a typical entry:
<station> <station_id>KBOS</station_id> <state>MA</state> <station_name>Boston, Logan International Airport</station_name> <xml_url>http://weathergov/data/current_obs/KBOSxml</xml_url> </station>
