In the first part of basic mapping, everything went on the map nice and easily, in real life this pretty much never happens (x10 for every hour nearer to deadline you get).
The most consistent way to get things in the right place on a map is to add in columns based on longitude and latitude (it’s also pretty essential if you’re planning to use Tableau to visualise it all).
You will probably also need to do this for data that doesn’t come in proper address formats, which is pretty much every FOI ever.
In my case, the worst one ever was the parking tickets, which involved geocoding about 120,000 parking ticket locations.
Recently I’ve been playing with data about the location of bookmakers in Wales, and have managed to pull addresses for about 400 from yell.com and other service listings.
Get the data here: Bookies-Locations-csv
Mapping them on to a Google map means you hit a slight problem with things not been located to Wales.
With so many obviously in the wrong place, and too many to check if they’re less obviously in the wrong place (like South Wales locations in North Wales), a better mapping method is needed.
The first task is to pull as many postcodes out of addresses as possible.
In the column next to your address column you’ll need to enter the formula =RIGHT(B2, 8), where B2 is the cell in the address column. The eight is the number of characters you want it to copy from the end of the string in the address column (eight will cover a seven character postcode).
Copy and paste down entire column.
As a lot of the addresses don’t have postcodes, you’ll need to get rid of those. Sort the list A-Z and then get rid of the space in front of six character postcodes by swapping the eight in the formula to a seven, then delete everything that’s not a postcode.
Re-sort the list so you’re left with the postcodes at the top. Now to find locations for them.
The best place I’ve found for locating postcodes is the batch geocoding tool at www.doogal.co.uk. Just copy and paste the column of postcodes into the left-hand box and press geocode to get a nice list of corresponding longitudes and latitudes.
Copy and paste these into your spreadsheet (in OpenOffice I can paste them straight in noting the text is comma separated, however with Excel I need to go via Word to turn the text into a table, via the convert text to table tool).
Check the postcodes match up in the two columns (it does sometimes miss some) then delete one of the postcode columns.
That cover’s about a third of the addresses, what about the rest.
The other tool I use a lot is the GPS Visualizer’s Geocoder, where you add addresses and it returns locations using Google or Yahoo.
First I’m going to add Wales, UK to the empty cells in the postcode column to try and cut down on the number of points ending up in the USA.
Next copy and paste the cells you want to geocode from those two columns into the box on the page.
You now have two options from the source box, Yahoo, which is quicker but I find makes more mistakes, or Google, which is easier to check for accuracy but more of a faff.
As you can see, the Yahoo version comes out in a nice copy and pasteable format, see box, but you’d need to check each point on the map to see where it’s actually put a location and whether it’s right…or if it’s somewhere on the Isle of Wight.
Google results don’t output as a nice text file. Once they’re done geocoding (takes a little while) you’ll need to click the Create GPX file button to get the output.
Copy and paste the contents of the box into a separate sheet in your spreadsheet. What you’ve got is a line with the latitude and longitude in, a name – which is the address you were looking for and a description, and a description (desc) – which is what Google found for you.
Where the name and description match, copy and past the latitude and longitude into the cell in your main spreadsheet next to the matching address (you can use ctrl to highlight several cells and speed things up a bit).
When you’re down you’ll have a nice list of location information with few addresses still to find.
Lets get rid of the extraneous information, just find (ctrl and f) the <wpt lat=” etc. parts and replace with nothing. Make sure to leave a space or find and replace one in between the latitude and the longitude.
Let’s get them split into two columns.
In Excel, highlight the cells you want to split, go to Data, Text to Columns, select Delimited in Step 1, in Step 2 pick the other checkbox and add a space in the box and hit finish (the location defaults to the current and neighbouring cells).
Those addresses still without locations – have to be done by hand, I’m afraid, Hopefully there’s not too many.
Back to doogal.co.uk and the find a postcode page, then it’s just a case of copying and pasting in the address to the box, hitting find and then copying the resulting latitude and longitude into the spread sheet – you may need to Google any really obscure addresses.
And that’s pretty much it, eventually you’ll make it to the end of your list of addresses and they will all have corresponding longitudes and latitudes, then just upload them to Google Fusion tables, select longitude and latitude as the location column and ta da:
For more tips and ideas on working with data, buy Get Started with Data Journalism