April 20, 2009
I know it’s mostly my family that reads this and most of them won’t care about this, but I need to say some things about urllib2 in Python.
I wrote the Ruby one a while back. It uses the standard Ruby Net::HTTP library to manage the protocol. A few weeks back I started on Python, and decided to use the standard urllib2 library in that language.
At first I was pretty impressed with the architecture of urllib2. The way it has a generic “opener” object with handlers for different protocols and errors is pretty slick. As I got further in, though, it became clear that it was written by looking at what browsers do and not by reading RFC 2616 and actually following the standard.
- By default, it only does GET and POST. Yes, that’s all that browsers do, but HTTP has other methods. I found the workaround on Stack Overflow and it’s not too unreasonable, but I still prefer the way Ruby’s Net::HTTP has a different subclass of Request for each method. (Net::HTTP even has subclasses for WebDAV.)
- The big thing was when I did a POST and urllib2 threw an exception because the status code was 201, not 200! The RFC specifically says that by default all codes from 200 to 299 should be treated the same as 200, so this violates the RFC. It turns out that this has been fixed in Python 2.6, but our production environment will be Python 2.5, so I still need a workaround.
The workaround I came up with is to create a subclass of urllib2.HTTPErrorProcessor that has the code from the Python 2.6 version, and pass it to urllib2.build_opener() if the version of Python is less than 2.6. (I was already building an opener object so things like authorization and cookies would be database-specific.)
But I still think that people who are writing libraries around a published specification should actually read the specification and follow it in the first place.