clairemiller.net

Data Journalism for Beginners: Where to find some data

| 0 comments

Part of a series of posts looking at tips and ideas for getting started with data journalism

May 1 – why data journalism and getting started
May 2 – finding your data
May 3 – cleaning it up
May 4 – mapping and visualising

Want more? Buy Getting Started with Data Journalism, a complete beginners guide to finding, cleaning analysing and visualising data in any size newsroom

Sources of Data

Data can be found in a variety of places.

As more and more is released, knowing how to find the most useful datasets and those that contain the most interesting stories will be a key skill.

Government Statistics

The ONS publication hub, a useful daily source for data

The ONS publication hub, a useful daily source for data

Governments release a lot of data and tend to do it on a regular basis.

Much of this will be collated data, where data is combined over a large area or longer period of time to give an overview of a subject, such as birth rate or burglaries.
These kinds of releases provide much of the day-to-day information for data stories.

They are the kind of stories most journalists will regularly cover, particularly anyone with a specialist beat such as crime, health or education. Most journalists probably would not particularly see these kinds of stories as data journalism.

Working with these types of datasets tends to have a fairly straightforward workflow:

  • Look at the data when it is released, usually early in the morning
  • Analyse to pick out any possible changes – as the data tends to already be summarised, most stories will come from comparing different years or different areas to see changes over time, or to see areas where the numbers are particularly high or low.
  • Writing stories, getting comments, trying to get experts to explain why the numbers might be like they are or, if the numbers highlight a particular issue, getting political comment on that.
  • Putting together visualisations and datasets to illustrate any stories.

Data sources to get you started

  • ONS – The Office for National Statistics holds all of the datasets released by the ONS, some even going back to the 1960s.
  • ONS Publication Hub – data organised by date of release – this is the best way to keep up with official statistics as they are released. Also, a good way to plan future stories around datasets that will be released in the future.
  • Data.gov.uk – another repository for UK data – this one is broader than the ONS and tends to also include datasets put together by individual departments and also data released as open data.
  • Data.gov – the US equivalent of data.gov.uk – bringing together data from federal agencies. It also has links to open data sites for states and cities, as well as around the world.

Written answers

In many parliaments, there is an opportunity for opposition politicians to submit questions to those representing government departments. The answers to these questions are often published on the website for the parliament or government and in some cases will include tables of data.

It is often easy to search through the published answers – scrolling down, scanning for tables – and the data included can be interesting – the information on children in custody came from a parliamentary answer.

If you are looking for data on a specific topic or related to a particular area, you can use They Work For You to create RSS feeds or alerts for searches.

While you have little control over what data is published, it is worth checking the published records of the relevant parliament, as you may turn up data that does not apply to the area you cover, but may give you ideas for FOI requests.

Searching

The internet is full of data, buried away on sites, waiting to be found and turned into stories.

Sometimes just looking on a website will not lead you to the most interesting data – council websites in particular, can be notoriously difficult to navigate.

Often a Google search will return everything but what you are looking for.

However, Google search has a number of options that make pinpointing datasets or searching out what public bodies hold easier.

  • To restrict search to just one site, type site:then the address of the site without the http:// or the www
  • To restrict search to just one site, type filetype:the file type you are looking for.
  • Put phrases inside quotation marks (“  ”) in order to search for the exact phrase instead of separate words
  • Use a minus sign (-) in order to eliminate words or phrases from your search
  • Narrow down a time period or range of numbers by putting the two numbers at either end of the range separated by two full stops (e.g. 2009..2011)

Freedom of Information (FOI)

Freedom of Information acts can be powerful tools in obtaining data you are interested in. Many countries now have laws in place allowing people to request access to information.

What data are you looking for?

Envisage what data will look like, then ask for it. Be specific – do you want a number per year per area? Per month? You do not just have to ask for figures – you can also ask for documents.

Better yet…

Seek data in original form – ask the organisation to send the anonymised database exported to spreadsheet.

Potentially it gives you more options for finding stories, as you have the raw data to work with and to look for patterns/outliers in

But it is not always easy to convince authorities to do this.

You may run up against problems such as a lack of understanding about why you might be interested in the raw data, uncertainty about how to deal with big datasets, programmes that do not seem to allow export, concerns about personal data, and worries about the amount of time it will take to get the data.

It may help when seeking a dataset, to ask for the record, code sheet or scheme – i.e. the categories under which the data is collected, e.g. date, time, description, in order to get a better idea of what the data you are seeking looks like. Hopefully this will make it easier to work out what is personal data and should be excluded, and what can be released.

FOI Tips

  • While you do not have to mention the Freedom of Information Act, it helps to get your request to the right person.
  • You have to include a name and contact address (although this can be an email address, and you are not required to give a postal address). It helps to include several forms of contact details, in case someone needs to get in touch to clarify your request – telephone numbers are a good idea.
  • You can use a fake name (but make sure it is something sensible) but generally this should be avoided. The Act says requesters must give their real name and authorities can refuse requests from people it thinks are using a pseudonym. You may also run into problems if you want to make an appeal.
  • You do not have to say why you want the information and the fact you are a journalist is not relevant.
  • Be specific.
  • If you know what documents you want, ask for them by name, or keep requests limited to a small subject area.
  • If you know the technical terms, use them, e.g. children placed for adoption without their parents consent are subject to a care and/or placement order made through the courts.
  • For police, crimes recorded are usually easier to find on the computer system than incidents reported.
  • Councils tend to mainly work to financial years, as that is usually how projects are budgeted.
  • With FOIs it is best to ask for facts and figures – how much was spent on something, what is the current policy – rather than opinions. You can, of course, ask for documents containing people’s views on things.
  • If you suspect there may be a lot of data for a topic, limit the time period you ask for, you can always ask for more years in a later request if you think having more information over a longer period is important. However, do not try to get around time limits by breaking your request into pieces and sending each one off as a separate request – this just annoys FOI officers – plus they can combine them all into one request and refuse it on time limit grounds anyway.
  • Ask for confirmation that the request was received – it is always annoying to find out the reason you have not had a reply is because the spam filter blocked your request and you have to start over. Most authorities are pretty good at sending back something saying they are working on a response and giving an indication of when it will be due. If you do not get a confirmation, it is worth checking the request was received and has not gone astray.
  • Make a note of when a response is due back. Authorities have 20 working days to answer from when they receive a request. It is worth putting a note on the calendar, taking into account weekends and holidays and allowing a bit of leeway as it can take a couple of days for people to check their emails. Schools have up to 20 school days or 60 working days to account for the much longer holidays.
  • Information requests about the environment will often be dealt with under Environmental Information Regulations, which are generally treated in the same way as FOI requests. They have the same time limit of 20 working days for the authority to respond, but fewer exemptions for not providing the data – primarily personal data, national security, and whether a request is manifestly unreasonable.
  • If your request is for information about yourself, such as your medical records, you should make a subject access request under the Data Protection Act. You may be charged a £10 fee to have the authority compile this data.

Need some ideas for FOI requests, here’s a great big list of possible requests, and here are some tips on making use of FOI more efficiently.

Scraping

Scraping is the process of creating a computer program to extract data from a webpage, data stream or PDF and pull it into a useable resource such as a database.
It can be an incredibly powerful technique for gathering the data you want that is not in a useable format, however many of the techniques, require some understanding of computer code.

Very simple scraping

The food hygiene ratings are available as open data from the Food Standards Agency.

Open up Open Refine (formally Google Refine). You may need to download the program first – just follow the installation instructions to download and start the program running.

Open Refine runs in the Command window (the window with the black background) while you interact with it via a webpage. It describes itself as a power tool for working with messy data.

Once you have the program running, click Create Project and then select Get data from Web Addresses (URLs).

Then you need to copy in the URL of the page where the data is stored as an XML. Click Next.

You will then see a screen which shows the layout of the data in XML, which you need to highlight at the level you want to import into the table.

With the food hygiene data, the level you need to start at is the Establishment Detail level, as it contains all of the data about each establishment that has been rated. Click on that level to highlight it in yellow.

Open Refine will then scrape the XML data into a table for you and will display it on the next page so you can check it is scraping what you want. If it is, you can click next to see your table, where you can either download it to work on else where or use Open Refine to clean and analyse the data.

You can also use the data to create things like this map of Welsh premises with the lowest food hygiene ratings.

Want more? Buy Getting Started with Data Journalism, a complete beginners guide to finding, cleaning analysing and visualising data in any size newsroom

Leave a Reply

Required fields are marked *.