Now for a break from Project Euler… In my projects, I find myself frequently retrieving URLs from various servers. Sometimes I need to call a REST API endpoint and other times I need to scrape a site. And like a lot of programmers, I don’t like to rewrite code. So originally, in Maltrieve, I wrote a function called
get_URL() that wrapped calls to
urllib2.urlopen() so I didn’t have to repeat the error handling every time. It sucked.
Now in a work project, I have the same basic requirement and I brought over that function. But in daily usage, the terribad error handling kept biting me. Also, sometimes I need to set up the request with various parameters (like, say, an API key or a specific user agent string).
For the latter requirement,
isinstance() does the trick. We compare the parameter to the Python type
basestring because all sorts of subclasses could get used; this mostly matters around Unicode stuff. Otherwise, we make sure the parameter is the proper type of object.
Once we make the request, though, we need to handle possible errors. We should probably replace the calls to
logging.error() or similar, but for my implementations to date this works fine. In the
HTTPError is a subclass of
URLError is a subclass of
IOError. So we need to handle the more specific cases first, else we will never see the data for them. And while we could access specific attributes like
HTTPError.code, since all we want to do is tell a human, we just need the base value.
I suppose this could be rewritten as a class but that seems like one level of abstraction too far. And procedural programming can never go out of style.
I need to backport this to Maltrieve soon, I think. So many projects… Although I have made this code snippet available as a Gist for canonical purposes.