Bulk downloading the Census 2011

I need the Census data to perform some analysis; the ABS release schedule indicated the data comes out first by DVD, then on the web. I duly (and reluctantly – what is this, the 90s?) rang up the ABS and paid $250 for the DVDs. The first release hasn’t showed up yet, and they’ve put the data on the web – so that turns out to have been a waste of money. Apparently the post is slow from Canberra?

The ABS census datapack site requires registration, and once you get in there’s a matrix of download buttons. The buttons are connected to Javascript, deliberately in order to hide the URLs and make it difficult to download the data packs en masse. Unfortunately this is also awful from a usability point of view – if you want several files, it’s annoying having to guess which file corresponds to which mouse click.

Being sufficiently annoyed by all this nonsense, I peeked under the covers. The Javascript file which drives the whole mess has been minified; but simply changing the first function call from `eval()’ to `console.log()’ and then running the file through node.js prints out the original source code.

Helpfully this source code is heavily commented. I’ll leave it as an exercise to the reader how to get the datapack URLs from there, but the comments in the file are worth a read just for amusement value. I ended up modifying the ZipName function to print to the javascript console, then pasting my modified function along with appropriate jQuery selectors into the Chrome debugging console. Hey presto, I could actually download the data without using their wretched interface.

A fun diversion but you have to wonder – exactly why is the ABS taking such pains to make sure nobody downloads the data they’re providing to us? It seems likely a primary function of the department, so it’s odd that they are so hostile to its occurrence.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s