Python’s urllib2

April 20, 2009

I know it’s mostly my family that reads this and most of them won’t care about this, but I need to say some things about urllib2 in Python.

One of my responsibilities at work is to administer our Tamino databases. Tamino comes with APIs for Java and C# (and a really low-level one for C) and there are unsupported libraries for JavaScript and Perl. But I don’t want to use those languages; I want to use Ruby or Python. Well, it turns out that Tamino access is all built on HTTP and the underlying interface is really well documented, so I decided to write my own libraries.

I wrote the Ruby one a while back. It uses the standard Ruby Net::HTTP library to manage the protocol. A few weeks back I started on Python, and decided to use the standard urllib2 library in that language.

At first I was pretty impressed with the architecture of urllib2. The way it has a generic “opener” object with handlers for different protocols and errors is pretty slick. As I got further in, though, it became clear that it was written by looking at what browsers do and not by reading RFC 2616 and actually following the standard.

  • By default, it only does GET and POST. Yes, that’s all that browsers do, but HTTP has other methods. I found the workaround on Stack Overflow and it’s not too unreasonable, but I still prefer the way Ruby’s Net::HTTP has a different subclass of Request for each method. (Net::HTTP even has subclasses for WebDAV.)
  • The big thing was when I did a POST and urllib2 threw an exception because the status code was 201, not 200! The RFC specifically says that by default all codes from 200 to 299 should be treated the same as 200, so this violates the RFC. It turns out that this has been fixed in Python 2.6, but our production environment will be Python 2.5, so I still need a workaround.

The workaround I came up with is to create a subclass of urllib2.HTTPErrorProcessor that has the code from the Python 2.6 version, and pass it to urllib2.build_opener() if the version of Python is less than 2.6. (I was already building an opener object so things like authorization and cookies would be database-specific.)

But I still think that people who are writing libraries around a published specification should actually read the specification and follow it in the first place.

Advertisements

4 Responses to “Python’s urllib2”


  1. […] Since I already have a work-related post on the other blog, I’ll link to it: my rant on Python’s urllib2. […]

  2. Jon Says:

    i’ve come across the same problem with the 201 code and unable to upgrade the python version as it’s on a shared hosting environment.

    would it be possible to share the code on how you resolved the issue (by creatint the subclass) or provide a little more detail? Thanks!

  3. curtispew Says:

    I created a file called _good200.py that contains this:

    from urllib2 import HTTPErrorProcessor

    class All200Good(HTTPErrorProcessor):
    “””
    Don’t throw exceptions for any 2xx status code.

    Pass this class when calling C{urllib2.build_opener()} in Python
    2.5 or lower to get the correct response for all 2xx status codes.
    As of Python 2.6 the default C{HTTPErrorProcessor} works correctly.

    “””

    def http_response(self, request, response):
    code, msg, hdrs = response.code, response.msg, response.info()

    # was: if code not in (200, 206):
    if not (200 <= code < 300):
    response = self.parent.error(
    'http', request, response, code, msg, hdrs)

    return response

    https_response = http_response

    then in my code I have:

    # need workaround for HTTP status codes 2xx if version < 2.6
    from sys import version_info
    from sys import version
    from sys import platform
    _lessthan26 = version_info[0] < 2 or (version_info[0] == 2
    and version_info[1] < 6)
    if _lessthan26:
    from tamino._good200 import All200Good

    if _lessthan26:
    self._open_obj = urllib2.build_opener(self._session_handler,
    All200Good)
    else:
    self._open_obj = urllib2.build_opener(self._session_handler)

    Hope that helps.

  4. curtispew Says:

    Oh, you’ll have to figure out the proper indenting since WordPress got rid of all of it.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: