October 30, 2013
by Claire

The boundaries of data

Main Entrance to West Middlesex Hospital, Isleworth, London. What do you do about tracking data relating to an organisation or area over time, if the boundaries keep changing?

Recently, I’ve been doing more with English NHS data. This is a new one as I’ve mostly been dealing with Welsh NHS data! Which it turns out is relatively straightforward in comparison.

One of the problems is that over, even a short period of time, hospital trusts are created, merged, and cease to exist. This means if you’re looking to collate data over a longer period, you need to find a way to combine data to reflect the most recent incarnation of the trust.

It isn’t too difficult to sort things out if what’s happened is a straight merge and you have the raw numbers. So, for example, Bart’s. Healthcare is made of Barts and the Royal London, Newham, and Whipps Cross. By adding together the number of attendances at A&E, for example, you should get an reasonably accurate idea of what’s been going on at all the hospitals in the trust, even when they used to belong to other trusts.

However, as of October 1, South London NHS Healthcare Trust has ceased to exist. But it has straightforwardly merged with another trust. Instead the three hospitals that were managed by the trust have been taken over by different trusts – Princess Royal University Hospital, Bromley, became part of Kings College London Foundation Trust, Queen Elizabeth Hospital became part of the Lewisham Healthcare Trust (now Lewisham and Greenwich) and Queen Mary’s, Sidcup, was taken over by Oxleas NHS Foundation Trust (which wasn’t even an acute trust before this (see, I said this was complicated).

So you end up with data for three hospital trusts that you can’t compare to previous years.

One way around this might be to break everything done to the smallest unit, in this case hospitals, as they’re less likely to be subject to change, as they have a physical location, and it would be easier to keep track of how the data has changed over time.

August 21, 2013
by Claire

FOI and Datasets

Request Initiative, not-for-profit community interest company that helps charities, NGOs non-profits make use of FOI, is blogging top tips for using the FOI act.

This week’s topic, looking at asking for datasets, is timely, with the new FOI rules on datasets coming in from September 1.

The advice is good – datasets are always much better for analysing than the response you are likely to get if you let an FOI officer do the number-crunching for you (usually it’s the chance to maybe find several stories in the dataset, rather than just getting the answer to one question).

All good advice until I got to this bit, which just doesn’t match what I’ve found to be the case.

The best way to request a whole dataset is in CSV (comma separated values) format. Whatever proprietary software might be in use, every database can export to CSV and you can open it with Excel when you receive it.

Not the bit about asking for CSVs (ask for them, they’re great!), but about every database being able to export to CSV.

You’d think, right?

The problem, and experience has taught me this, is the that either no one knows how to get the programme to export to CSV (because it’s never come up before, because that’s not how the programme is used, because people generally only learn to do what they need to do, if they’ve never needed to do that then it probably isn’t something the programme can do) or actually it doesn’t export to CSV (or at least everyone in the organisation swears it doesn’t).

Yes, there is an entire story in how the public sector may have invested in proprietary software for organising data and processes, where if they ever decide to change software, they would have no easy way of exporting their old data out and importing it into a new system (except, probably, at huge cost). That, or the programme does have the option but it’s not enabled.

There was a great post yesterday from Andy Dickinson about trying to get data from the Home Office that suggests staff have to look through 75 screens to find information in just one database. Also yesterday, I had a conversation with South Wales Police, well known for their excellent records management, about how it would take them 2,000 plus hours to get some data out of a database (I did ask about exporting, you know, as opposed to typing it out by hand).

Which, brings me to: why I don’t think the new rules on datasets are going to make it any easier to get datasets out of public bodies.

The ICO has issued guidance on the new requirements. Reading through them, in theory, for most FOIs nothing has changed, small data tables put together to answer a question aren’t datasets covered by the new requirements so most FOI responses will go out much as they did before.

Apart from that, most responses concerning what are probably datasets under the requirement could be made compliant by not converting spreadsheets to a PDF (seriously, stop doing that), but instead by saving them to a CSV and then making them reuseable under the Open Government Licence.

Untitled-1 Do what Monmouthshire Council do and make the Open Government Licence your default setting.


However, I suspect the new dataset requirements are not going to lead to the bounty of data the Government is hoping it will.

The issue with requesting datasets now isn’t format (yes, PDFs are annoying when you get them but most are convertible), it’s getting the raw material out in the first place – the programmes that just don’t work like that, the information that isn’t held in that format.

It’s not going to solve the issue where the requester is incredulously pointing out that being able to export information in a simple, open format might be a basic and key requirement of any software purchase, while a council officer claims that, of course, the information can’t be exported out of the programme because why would anyone need to do that when everything is done in the programme and the FOI officer is stuck in the middle.

The new requirements may mean more attention starts to be paid to finding out how to export datasets, and may give FOI officers more power to get people to start asking questions of suppliers about export options but I suspect the caveat that data should be released in this form “so far as reasonably practicable” will win out and datasets will remain locked down.

On the subject of blogging about FOI and blogging weekly, I’m aiming to try (not committing myself there) to blog a bit more regularly (or at all). As I’ve finished my book, I have a little more time, so should probably put it to better use.

The plan is some more posts on FOI, data journalism (and I’m hoping the free time will allow for a bit more experimenting), and open data, at least semi-regularly. I have great hopes but then again, this blog is three-and-a-half years old and has less than 35 posts.

August 19, 2013
by Claire
1 Comment

Data journalism – necessity is the mother of invention

One of the things I like about data journalism (despite being very frustrating at times) is that it is very often an exercise in problem solving;  it makes me think.

Most of the problems that need solving are how to avoid doing something very tedious and time-consuming.

I’ve been working with the Land Registry price paid data recently. At first glance, the data seems fairly straightforward, if very big. Each CSV file for a month contains about 50 to 60,000 lines of data, so there are only a few places that will open more than one file, such as more recent versions of Excel

However, each monthly update contains changes to some of the previous data, either data to be deleted or changed. In order to avoid having duplicates of rows or data you shouldn’t have, the data needs tidying up.

First download all of the data you want and piece all the sheets back to together in one sheet. To keep the data somewhat manageable, I’ve only gone back a year. You may need to add in column headers (see the FAQ) to make keeping track of your data easier.

First, sort the record status column, so that all of the data will be in the same order, As then Cs then Ds, when you sort by transaction ID. Sort by transaction ID, this will ensure all the rows are ordered so that if you have more than one row with the same ID, they will now be next to each other.

Now, add a column next to the transaction ID and enter the formula =A2=A3, this will compare one transaction ID to the one below it and will help you identify duplicates. I then copied that column and used the Paste Values option to get rid of the formulas.

Adding the formula to the price paid data

I was then going to use Find (CTRL + F) to search for rows where the comparison was TRUE and then delete rows as necessary.

Why not just sort the comparison column and delete all the ones marked TRUE? Because some are rows that need to be deleted but some are rows that need to be replaced with a newer addition, when you sort, you can’t tell which is which.

By about row 1,200 (of 753,000), I realised this was not going to be an efficient method of cleaning the data. And if something feels like it is going to be a very tedious method of achieving something that is good reason to find a better way.

As I was working through the rows, I realised two things, because of the earlier sort, the rows with record status A always came first in the pairs of duplicates and the ones I wanted identify in order to delete were the ones where the record status was A above but the row below was D (you already know you want to delete all the rows where record status is D, they don’t need more identification).

So it would be possible to write a formula to identify these rows.

Add a new column at the end of your data and add the formula =IF(AND(B2=TRUE, P3=”D”),”YES”,”NO”)

The formula to find the duplicates

The formula will put a YES in the column if the row fits the criteria of being a duplicate where the matching row (the one below) is marked for deletion. Otherwise the formula will return NO.

Why are some words in quotes and not others? Text, otherwise known as strings, needs to be in quotes so Excel can recognise it. The TRUE (or FALSE) are not text, they’re Boolean values – items in this column can only have one of two values (true or false) and Excel recognises this automatically, so no need for the quotes.

Once you’ve copied the formula down, you can do a copy and paste values to get rid of the formulas. Next sort this column to gather all the rows identified as YES together so you can delete them. You can then delete all the rows with the record status D.

Repeat with the rows to be changed, replacing D with C in your formula. This time you will want to delete the rows identified by your formula but leave the Change rows which are the updates to the data you’ve just deleted.

A much quicker way to clean your data and stop duplicates making your calculations inaccurate.

There are probably still better ways to clean up the data, but it is amazing how much you can achieve with Excel formulas and a bit of thought.

For more tips and ideas on working with data, buy Get Started with Data Journalism

August 9, 2013
by Claire

How to get started with data journalism

Well, it’s finally here. I’ve published my book Getting Started with Data Journalism.

It is a comprehensive look at techniques for finding, cleaning, analysing and visualising data and hopefully help at least a few people learn more about data journalism (or just a few basic sums and a better understanding of Excel, pretty much essential skills for journalists today).

It’s taken about a year to write (and edit pictures, there are lots of pictures).

It is aimed at beginners and people who are just getting to grips with data journalism. There are lots of practical examples, with links to datasets you can practice on. This is kind of the main idea behind the book, to use real world examples from an actual newsroom, to help give people an idea of what works and how you can use data journalism to find great stories.

There is a sample of some of the chapters on the page so you can get an idea of the contents, plus the table of contents tells you what’s in the book.

June 26, 2013
by Claire

Behind the scenes on an award winning story

So this time last week, I was getting ready to catch a very early flight to Paris ahead of the Data Journalism Awards.

My story looking at where social services in Wales sent children for care placements and which councils from other parts of the UK were sending children to Wales won the award for data driven investigation, small media. So I set off for an exciting trip to Paris. It was an honour to be recognised alongside some awesome projects from around the world.

One of the things I had to do for the ceremony was put together a video about how the project was put together. Seeing the videos from the other winners and finding out more about the project was definitely the most interesting part of the ceremony. You can view all the videos here.

I’m currently working on a book on how to get started with data journalism, you can sign up for updates here, and this story is one of the projects featured in the book is the care placements for children – looking at ways of cleaning up data so you can use it to find stories.

Below is an extract from the Cleaning Data chapter of the book, which explains some of the techniques used to put to together the story.

The data for this story was gathered through FOI requests, which were replied to in a variety of formats.

When formatting data for use, you need to think about what question you were asking when you sought out the data and what information you were hoping to get back – this will help you lay out a table so that you can use it to look for answers.

In this example, the request asked for the information held by Welsh councils on care placements over the past three financial years.

To get a better understanding of the data, it made sense to split it into two tables, one for children being sent elsewhere by Welsh Councils and one for children being sent to areas in Wales.

To make the information understandable, you need to set out your column headings to organise it – placing council, area where placed and each of the financial years and then formatting the data provided in order to fit.


The master spreadsheet for care placements

In some cases, the data was already formatted in a way that would fit, with just the name of the council that provided the information needing adding.

In others, the data was provided as a table for each year, so these needed to be joined together so information for the same area could be read across all the year columns.

Mostly the data was provided as spreadsheets, making this an easy task to complete. Other forms such as Word documents and tables within the body of the email could also be easily copied into a spreadsheet and added to the table.

While putting together the data, it became clear that a summary table with totals for each council would need to be added, as not all the councils had provided a breakdown by area. Some had only provided a total, while some had provided a total plus information about where children were sent, but not individual totals for each area to avoid data protection issues.


Once the data was all in one table, it makes it easier to start analysing it, for example, adding up the totals for all the councils to see an overall total for the whole of Wales.

Once the data had been gathered together in a master spreadsheet, it became clear that all of the councils had different ways of naming councils, and so the data would need to be tidied and standardised in order to get a clear picture of how many children are being sent to different areas.

Tidier data also makes it easier to get the data into a format that can be used to create visualisations – as you will see in a later chapter, you sometimes need specific formats in order to work with particular tools.

The data on where children had been placed by Welsh councils came from 18 different councils and as such contained a number of different variations for council names.

To start cleaning, the data you need to open up Open Refine and create a new project by uploading a CSV of the data you want to tidy up.

Once you have your data open, select Facet > Text Facet from the dropdown menu on the column you want to clean.

A box will appear on the left-hand side of the screen showing all the different data items in the column and a count of the number of times they appear.

Click on Cluster to bring up the panel to start matching up names.


As you can see, the programme identifies data that is likely to be the same, either because they are the identical but use different capitalisation, because they look like shortened versions of a similar thing, or because they are a misspelling.

For example, NEATH PORT TALBOT, Neath Port Talbot, and Neath port talbot or Birmingham and Birgmingham.

For each identified possibility, there is a box to tick if you agree that they are a match. There is also a space to enter what you would like everything renamed to – you can change it from the default selection by entering your own text or by clicking on the link in the list for the version you prefer.

When you have been through the list picking out the ones you want to merge and what you want the merged group to be called, you need to click either Merge Selected and Re-cluster or Merge Selected and Close, depending on whether you think there are more similar data items to be found.

If the programme is not finding many similar items, you can change the way it makes comparisons by changing the method and keying function at the top of the panel – this essentially changes the number of similar words or phrases that will trigger a match.

The most effective way of working is to keep changing the criteria until either no more matches are found or none of the suggested similar items match.

In some cases, you may need to be quite liberal with your criteria. Here, to catch things such as Swansea County Council and Swansea City Council, you need to set a criterion that also tries to match Southampton and Northampton.


Using the cluster method, it was possible to go from 289 different items to 209 in about five minutes, which is much quicker than trying to do it manually.

Once you have used the cluster function, you may then need to go through the list to catch any ones you may have missed. As it has been sorted alphabetically, you should be able to see similar data listed together. You can also sort by count, which may help identify items that only appear once or twice, possibly because they are a misspelling or an uncommon variation.

To edit, hover over the item you want to change and then click the edit button and type in what you want to change the title to. As you edit, the programme will automatically recalculate the list to reflect the new count of items.


Once you have all the data tidied up, you can see more clearly which councils are seeing the most children sent to their area. You can also then use this to map the locations where children, in this case using Tableau.

January 9, 2013
by Claire

A Great Big List of FOI Ideas

Is one of your journalism resolutions this year to put in more Freedom of Information requests but not really sure where to start?

David Higgerson’s blog has a FOI Friday feature, collecting together stories from local and regional newspapers that have  been based on responses to FOI requests.

The list below collects together the topics of the requests featured, grouped by authority the information was requested from.

There are nearly 500 ideas below so it should keep you (and FOI officers across the country) busy for months to come.

Edited: Several people who work on the public authority side of FOI have pointed out that some of the information in the ideas listed below will already be available elsewhere (usually websites, accounts etc). FOI Man has a good guide on how to send a responsible and effective FOI.

Also the list is set up from the point of view of how Welsh public bodies are organised, unitary authorities and single-level health boards, this is very different from England where you seem to insist on making everything twice as complicated, so check which layer of bureaucracy is responsible for whichever area you want information about before you send your request off.

1. Councils

Councillors and meetings

  • Cost of tea and biscuits at meetings
  • Meeting attendance records for councillors
  • Councillor allowances and expenses
  • Council cash spent on councillors’ meals
  • Council tax reminders for councillors
  • Councillor’s first class travel
  • Training days for councillors
  • Council spending on political advisers

Council Staff

  • Cost of union officials, who spend most of their time on union work
  • Staff who live outside the council area
  • Gagging orders on staff
  • Amount charged to staff for making private calls
  • Jobs get cut – but the agency spending goes on
  • Relocation costs for staff
  • Discrimination payouts
  • Staff breaking computer use/social networking policies
  • Top salaries at councils
  • Money paid to officers to cover their commutes
  • Accidents in council buildings
  • Number of council officers with powers to enter homes
  • The cost of suspended staff
  • Bonuses paid to staff
  • Assaults on council staff
  • Redundancies – compulsory, cost, older or younger workers been hit
  • Cost of finding new top officers
  • The number of club class flights, luxury hotels and trips abroad made by public servants on the taxpayer
  • Cost of recruitment
  • Hotel stays by council staff
  • Calls to speaking clocks/premium lines
  • Websites visited by staff
  • Council workers who took redundancy – and then got new jobs at the same council – cost of redundancy
  • Council spending on mobile phones – the number of phones and the size of the bills and comparisons between councils
  • Council workers overpaid due to payroll errors.
  • Number of days council workers take off sick and how much it costs – including stress
  • Cars for top council executives paid for by the council
  • Amount paid (day or hourly rate) to temporary staff at councils.


  • Spending on council credit cards
  • Cost of council IT equipment
  • Cost of buying stationary
  • Compensation payments by the council – amount and reason for payout, including the lowest
  • Do local councils support local business by paying bills quickly
  • Cost of strikes
  • Money spent on consultants
  • Whether councils have a policy to buy local
  • Cost of cancelled projects
  • How much council tax is outstanding and how is it being chased, how much is written off?
  • Cost of council refurbishments
  • Money spent publicising cuts (or how council will cope with them)
  • Cuts in grants to charities/community groups
  • Council investments in dodgy companies (arms dealers etc.)
  • Impact of cuts on departments – number of staff, size of budget
  • Council spending on football clubs
  • Council spending on temporary staff
  • Council debts (rent, council tax) collected from the estates of dead people
  • Councils getting liability orders and using bailiffs – locations of orders
  • Council car mileage claim and rates that are higher than the 45p HMRC recommendation
  • Amount councils make from selling on electoral roll data
  • Land sold off by councils

Social Services

  • How many children have been rescued from child sex rings
  • Forced adoptions
  • Children put in care because they are obese
  • Elderly abuse
  • Complaints against social workers
  • The cost of closing care homes
  • Cost of social workers
  • Problem families relocated to the area (apparently covered under Multi Agency Public Protection Arrangements)
  • Opening and closing times of youth clubs – kids back on streets by 7pm?
  • Children in care sent to other council areas
  • Number of children going missing from care – including ages

Highways and Vehicles

  • Complaints about cabbies
  • Hidden criminal offences of taxi drivers
  • Crashes involving council vehicles and what the repairs cost
  • Street lights being switched off
  • How many potholes have been filled in, how many have been reported and cost of fixing them and paying compensation
  • Fixed penalty notices for littering, dog mess
  • Private gritting jobs
  • Cars seized by the council
  • Taxi vehicle checks


  • Complaints about traffic wardens
  • Overturned parking tickets
  • Unpaid parking fines
  • Ratio of parking spaces to parking permits or parking permits issued by road
  • Fines for using disabled parking spaces
  • Parking tickets – and how many issued incorrectly
  • Cost of resident permits
  • How much councils make off parking fines in top ticketed streets
  • Top earning car parks
  • Car park use falling in the recession?
  • Parking fines for council vehicles
  • Parking meters that don’t give change
  • Attacks on traffic wardens

Planning and Housing

  • Officer’s advice on planning decisions
  • List of approved and proposed phone masts
  • Empty houses – can also ask how many get second home relief and how much council tax is being lost
  • Housing built on back gardens or several houses on plot of one house – and council attitudes
  • Number of people complaining about landlords and letting agents – reason for complaints, and number of prosecutions and fines
  • Big claims for housing benefit
  • Numbers and locations of student homes (versus number of properties registered for council tax)
  • Affordable Homes – where and what are they
  • How much section 106 money councils have received and how much hasn’t been spent
  • Homeless people winning appeals against councils for refusing support
  • Cost of housing benefit cheats


  • Streets with the most flytipping incidents
  • Cost charged to people to buy a new wheelie bin
  • Fines for leaving the bins out
  • The number of bins private contractors for a council fail to collect each month
  • Cost of replacing wheely bins that are lost or stolen


  • Pest control in schools
  • Illegal parking outside schools
  • Schools which have dispensation from holding Christian assemblies
  • Is home schooling on the rise
  • CCTV cameras in schools
  • The cost of crime in schools
  • Estimated cost of repair backlog at schools
  • Racism incidents reported in schools
  • School bus cuts
  • Different languages spoken in schools – list of the languages


  • Compensation claims by children for incidents in schools…and on school trips – numbers and payouts
  • Rowdy kids on school buses
  • School pupils caught with porn on school computers
  • Largest number of violent incidents carried out by one pupil
  • Special needs tribunals – to appeal not being statemented
  • Racist schoolchildren
  • The reasons for exclusion from school
  • Children losing school places because parents lie on applications
  • Cost of taxis for schoolchildren
  • Education costs for traveller children
  • Truancy facts and figures – days lost, prosecutions, fines.
  • Number of pupils citing bullying as a reason when they look to switch schools
  • Children missing education – children who don’t turn up to school and whose families can’t be contacted
  • School place refusals
  • School clothing grant postcode lottery

Teachers and Staff

  • Disciplinary action against teachers
  • Teachers’ sick days
  • The cost of temporary teachers
  • People in the classroom without a CRB check
  • Attacks on teachers
  • Failing teachers allowed to keep on teaching after being disciplined
  • Spending for ‘off site’ training for teachers
  • Pay rises at colleges and schools
  • Cover supervisors taking lessons – should only do it for a short while using work from teachers
  • How much do headteachers earn
  • Teachers being made redundant

Libraries, Leisure Centres, Sport and Recreation

  • What goes missing from libraries?
  • People banned from libraries and leisure centres – and why they’re banned
  • Antisocial behaviour reported in libraries
  • Footfall at libraries
  • Library fines – broken down by library
  • Spending on library books, and other things found in libraries
  • Items not being taken out of libraries – list of items and the last time they were borrowed
  • Safety fears at a football ground – council safety advisory group which gives permission for clubs to hold matches
  • House seats at council venues
  • Spending on sport – by different sports
  • The wait for an allotment
  • Visits to council run sports centres


  • Number of lost and abandoned dogs found by councils – numbers reclaimed by owners, number put down.
  • Food hygiene inspection details – poorly rated premises
  • Cost of a plaque for ashes scattering
  • Data protection breaches
  • Amount of food thrown away by councils
  • Numbers of stray horses and cost of dealing with them
  • Neighbourhood Renewal Assessments – information on the state of areas
  • Cost of hiring venues for council events
  • Noise complaints and which areas see the most complaints
  • Spending on private detectives
  • Cost of burial and cremation
  • Available burial plots
  • Noise complaints – by ward/street and reasons
  • Pest control call outs
  • Smoking fines – how many?
  • Complaints about noisy animals
  • Body unclaimed in a mortuary for 10 years
  • CCTV camera numbers
  • Value of art
  • Use of Regulation of Investigatory Powers Act powers on staff or the public
  • Complaints about council services
  • Footfall surveys of shopping centres
  • Broken CCTV cameras
  • Spending on Christmas lights
  • Paupers funerals

2. Police

Crime – General

  • Number of dangerous dogs seized, number of attacks reported and the number of dogs destroyed
  • Levels of shoplifting
  • Metal thefts
  • Petrol station drive-offs
  • Police called to incidents at schools, colleges and universities, e.g. burglaries – and what was stolen, hate crimes
  • Crimes in hospitals, train stations, leisure centres, roads with the highest property values, the Welsh Assembly (you get the idea)
  • Number of burglaries that lead to convictions
  • Crimes committed in public buildings
  • Hate crimes against disabled people
  • Thefts from churches
  • Scale of domestic violence – at a police patrol level and number of prosecutions/convictions
  • Burglary hotspots
  • Ride and run crimes involving people getting a cab ride and running off at the end without paying.
  • Crimes involving Facebook, Twitter, eBay
  • How often kidnaps were reported, and what the motives were, and how long people were held for
  • Theft of numberplates
  • Crimes committed by under 10s
  • Number of crimes undetected
  • Petrol thefts
  • Railway cable thefts
  • Reports of sheep worrying – and whether animals were attacked or killed
  • Crimes commited by foreigners


  • OAP criminals
  • Prolific and other priority offenders and the number of crimes they’re responsible for


  • Number of guns which police had taken off the streets
  • Number of gun licences and who holds them
  • Children carrying offensive weapons and what weapons
  • Items seized as weapons by the police
  • Location of knife attacks

Roads and public transport

  • Highest speed clocked by speed cameras and where
  • Number of images taken by ANPR cameras
  • People using mobile phones while driving
  • Tip off fees for letting recovery firms know to collect vehicles after crashes
  • When are speed cameras turned off
  • Top-earning speed cameras and those which catch hardly anyone
  • The number of people caught ignoring traffic signals and nipping across a dangerous railway level crossing
  • Fined for riding a bike on the pavement
  • Attacks on speed cameras
  • Number of incidents of taking vehicles without owner’s consent – and age of youngest criminal
  • Top 10 streets with the most accidents.
  • Nationality of drink drivers
  • Speed tickets being overturned…
  • …and don’t forget traffic light cameras
  • Bus lane fines
  • Crime on buses
  • Attacks on public transport which aren’t resolved
  • Speed camera detection rates

Drink and Drugs

  • Teenage drug dealers or young people caught with drugs
  • People prosecuted for breaching the smoking ban
  • Women committing violent crimes while drunk
  • Fixed penalty notices issued to drinkers – age breakdown
  • Drink cited in the most crimes?
  • Cannabis factory finds
  • Call outs to pubs and clubs
  • Locations of drug warrant executions – warrants issued under the Police & Criminal Evidence Act, 1984 (PACE) or the Misuse of Drugs Act 1971 – whether properties have been raided more than once.
  • Number of times booze had been seized off people in the streets and for a number of other actions police can take to tackle the issue.
  • Youngest drink driver
  • Number of people stopped for drug possession – youngest, reason for stops

Sex Offences

  • Sex offenders on the run and how many may have fled abroad
  • Arrests for ‘extreme porn’
  • Rape complaints and the number of convictions, also the number not investigated
  • Use of Sara’s Law
  • Sex offences involving the under 16s
  • Where police have found couples involved in ‘indecent acts’
  • The number of child exploitation cases reported to police
  • Sex attacks carried out by relatives


  • Cold cases under investigation – and how many have been solved in recent years
  • Cost of individual investigations – both solved and unsolved
  • Top 10 most expensive police investigations
  • Cost of big police investigations – especially failed ones


  • Cautions used to deal with serious crimes…
  • …or worse serious crimes “screened out” so not investigated
  • Number of cases dealt with using restorative justice – and cases involving children
  • How many crimes are solved through other offence been taken into consideration, and the percentage of the total solved cases
  • Numbers of post-charge cautions being given out


  • Taser use
  • Stop and search on children – with ages
  • Number of Osman warnings dished out by the local police – warnings issued after police have received specific intelligence that individuals were at serious, and imminent, risk of being killed.
  • Where is crime most likely to be solved – detection rates by ward
  • Stuff found on police raids
  • Reports of ghosts, werewolves, witches, aliens, zombies and the like
  • Fugitives on the run and what crimes are they wanted for, and how long they’ve been on the run for
  • Number of stop and searches carried out
  • Spending on forensics
  • Number of crimes solved from information gained through force’s Facebook or Twitter accounts
  • The cost of policing the EDL (or similar) protests
  • Use of CS spray
  • How busy are the armed police?
  • Big Cats
  • Amount of money paid to police informers
  • Number of negative warrants carried out by police (where nothing is found) and cost of compensation as a result
  • Bomb alerts – number and response
  • Number of children having DNA swabs taken by the police – broken down by age
  • Number of children held in police cells – ages


  • Filling up police cars with the wrong type of fuel
  • Accidents involving police cars and cost of dealing with them
  • Cost of security for police buildings
  • How much forces hope to make by selling off police buildings
  • Spending on bottled water
  • What police helicopters are used for
  • Cost of fuel for police vehicles
  • Police charging organisations for policing events

Officers and Staff

  • Money handed out to police officers in bonuses
  • Claims of racial abuse made against police
  • Disciplinary proceedings against police officers for breaching rules on social networking sites
  • Coppers under investigation and why – information about disciplinary action
  • Complaints against the police – and how many upheld
  • How much different ranks earn
  • Police officers taking second jobs
  • Crimes committed by police officers
  • Credit card expenses for senior officers
  • Complaints against PCSOs (compared to police officers)
  • Gifts to the police
  • Missing laptops and mobile phones and data losses on the police force
  • How many officers had retired from the force with pensions only to return in civilian roles
  • Assaults on police officers
  • Things stolen from the police
  • Coppers on desk duty through illness
  • Luxury cars for top brass
  • Police jobs that are vacant
  • Officers disciplined for using police computers to run searches on neighbours, family and the local area
  • Forces forcing officers to retire after 30 years
  • Overseas travel bills for police officers
  • Police spending on hotels – number of nights per stay, number of people staying (lets you work out likely room rate).
  • Emergency service personnel taking time off because of stress
  • Historic data on police numbers
  • Cost to police of defending claims (racism, sexism, harassment) from employees

3. Ministry of Justice


  • Assaults by prisoners on other prisoners
  • Classes prisoners can take
  • Stuff confiscated from cells, especially mobile phones
  • Compensation paid to prisoners and why
  • People returned to prison for breaking licences, what offences they committed and how quickly they reoffended
  • Items smuggled into prisons
  • DVDs prisoners are allowed to watch
  • Complaints made by prisoners
  • Menus served in prison

Courts and sentencing

  • How many people have been spared jail by being given a curfew order instead, and how many go on to break the order
  • Money recovered using Proceeds of Crime Act powers
  • Number of parents prosecuted for not sending their children to school
  • Items seized at courts

4. Fire

  • Call outs due to false automatic fire alarms and the cost of attending
  • Disciplinary action against fire officers
  • Fire checks and enforcement notices issued to schools
  • Chip pan/chimney fires – the number and the cost of tackling them
  • Animal rescues
  • Fire call outs where no smoke alarm has been fitted
  • Hoax 999 calls
  • Emergency service personnel taking time off because of stress
  • Fire crew response times
  • Compensation claims by fire service staff
  • Number of times firemen have been called out to help paramedics move fat people.

5. Universities

  • What have students been disciplined for
  • How much has been paid in hardship payments and what for
  • Reports on the state of buildings
  • How accurate are university slogans – e.g. Plymouth University’s claim of being “the enterprising university”, only two ex-students set up businesses with university help
  • The ways in which students have been caught cheating
  • How much the university wine/art/plant collection is worth
  • How many students demanded a refund for poor teaching
  • How much universities make from hiring out facilities
  • Where students come from – particularly those at top unis
  • Donations from dodgy sources – dictators and the like
  • How much are universities spending on hospitality
  • Expenses for university bosses
  • Library fines
  • Social network policies and how many people have been disciplined for breaking them
  • Student appeals against their exam and coursework grades
  • Bomb hoaxes at universities
  • Complaints about student behaviour

6. Health Boards


  • Number and seniority of doctors on duty on weekdays and at weekends – are hospitals understaffed at weekends
  • Levels of bed blocking (delayed transfers of care), how many and how many bed days lost, and the longest case
  • How many safety alerts are issued by the NHS National Patient Safety Agency have not been implemented
  • Serious untoward incidents
  • Injuries sustained in accidents in hospitals, patients and staff
  • Number of hospital beds being axed, and where
  • Deaths due to superbugs
  • The number of children treated for eating disorders, and their ages
  • ‘Red lists’ – lists of drugs which are approved for use but which are restricted due to their cost
  • Interventions Not Normally Undertaken list – the things the PCT won’t normally pay for patients to have
  • Number of cases of ‘surgical infection’ in patients who underwent major surgery and the number of emergency readmissions of patients following stays in hospitals
  • Number of available hospital beds and instances of overcrowding
  • Cancelled hospital appointments, as a percentage of all appointments
  • Incentive schemes to encourage doctors not to prescribe antibiotics
  • Number of people held at secure hospitals who have escaped and the numbers who have just left mental health units

Running Hospitals

  • The number of hospital meals binned every day and the cost of doing so
  • How much is being spent on temps and the hourly rates for agency staff – very expensive as a salary
  • Haunted hospital wards
  • Items handed into lost property at hospitals
  • Responses to annual staff surveys – particularly questions along the line of : “Am I able to deliver the patient care I aspire to”
  • Sick days taken by hospital staff
  • Reasons why hospitals call pest control
  • Spending on art in hospitals
  • Hospitals with poor food hygiene ratings, why those who got 0 or 1 got a poor report
  • Stuff reported stolen in hospitals
  • Hospital inspection reports
  • Civil law suits against hospitals – particular departments with high number of cases
  • Hospitals letting injury-claim lawyers advertise in the hospital
  • Amount raised by health boards in hospital parking


  • Extra payments for doctors to do small operations in GP surgeries
  • GP surgeries where the premises are below expected standards
  • Amount of money paid to GPs for signing documents which allow funerals to be arranged
  • Premium rate numbers to ring the doctors


  • The number of surgical instruments returned to NHS hospitals dirty or broken after being sent to be sterilised and whether operations had to be cancelled as a result
  • How many operations go wrong, which ones and how
  • Operations cancelled due to bed shortages
  • Money spent on plastic surgery and which operations
  • Number of tonsil removal operations
  • Foreign objects removed from patients – how many and what
  • Objects left in patients

Contraception and Maternity

  • Babies born to drug dependent mothers, who were transferred to a drug treatment programme to wean them off their addiction
  • Number of times morning after pill has been given out to teenagers
  • Number of times maternity units have had to be closed because they are full.
  • High-dependency cots for newborns being kept out of action due to staff shortages
  • Number of complaints about maternity units
  • Number of mums who give birth when they have addictions – what they’re addicted to.

Accident and Emergency

  • A&E use by postcode
  • How many times people have been diverted from A&E because it is too busy
  • People arriving in ambulances who have to wait to be seen
  • Number of people attending with dog attack injuries
  • Number of gun injuries – information on types of guns

Health Boards/PCTs

  • Money spent on public relations and marketing
  • Top earning bosses
  • How successful have healthy living projects been in getting people to live healthily
  • Luxury cars for top brass
  • Spending on licences to play music
  • Public health funerals (like pauper’s funerals but paid for by hospitals)
  • Why staff have been suspended
  • Costs of private finance arrangements for building and maintaining hospitals
  • Cost of translation services
  • Attacks on staff – details of incidents
  • Cost of priests and chaplains
  • Money spent on taxis for patients
  • Spending by health boards on getting people treated privately
  • Amount owed to health boards by overseas patients not entitled to free treatment – breakdown of where they are from and how much they owe, how much written off
  • Number and cost of exit packages at health boards


  • Number of children being treated for weight-linked issues, and their ages
  • Number of gastric band and similar procedures carried out
  • Spending on beds and equipment for oversized patients

Drink and drugs

  • Hospital admissions due to drink, age breakdown
  • People admitted to hospital for drug overdoses – numbers for different types of drugs, youngest
  • Drugs found in overdose cases – which is most common

7. Ambulance

  • How many 999 calls were attended by emergency care assistants only rather than with a fully-qualified paramedic – the split is supposed to be one of each per ambulance
  • Ambulance crashes
  • Police taking injured and sick people to hospital because of ambulance shortages
  • Details of investigations carried out under the Ambulance Service’s ‘Adverse Incident Reporting System.
  • How many alcohol-related incidents involving under-aged drinkers
  • Number of incidents where ambulances waited for more than 20 minutes and how long for – average for each hospital and longest turnarounds
  • Thefts from ambulances
  • Out of date drugs in ambulances
  • Emergency service personnel taking time off because of stress
  • Number of times ambulances are delayed due to technical reasons – such as having faulty sat navs – also do ambulance services liaise with council etc. about road closures
  • Hoax 999 calls

8. Ministry of Defence

  • The number of staff disciplined for ‘leaking’ things from the Ministry of Defence on Twitter and Facebook
  • Compensation claims from the MoD – other than suffered in combat, compared to compensation for combat injuries
  • Breaches of security at armed forces bases

9. Health and Safety Executive

  • Incidents logged to the Health and Safety Executive that happened at schools and details of whether the pupil or staff victim taken to hospital for treatment
  • The number of health and safety improvement notices issued by the Health and Safety Executive to a local company

10. Bus Companies/Network Rail/Department of Transport

  • Numbers of passengers using staff buses and cost of providing services
  • Number of train services that ran with fewer carriages than stipulated in the franchise – Department of Transport
  • Number of trains that run on time
  • Correspondence between train operators and Government about finances
  • Bus crashes
  • Passenger numbers on trains at peak times – most overcrowded trains

11. UK Border Agency

  • How many checks done for illegal workers and the size of fines issued as a result

12. Royal Mail

  • Attacks on postmen by animals
  • Number of letters destroyed each year
  • Postcode area breakdown of complaints and the types of complaints, plus details of the numbers of posties investigated – in context of the number of rounds in each postcode
  • Attacks on postmen

13. Environment Agency

  • The number of rivers which fail to meet European standards for pollution and biodiversity

14. DVLA

  • How many driving tests had to be cancelled due to bad weather, and the knock on effect it has on people trying to book in for a new tests
  • The number of points the person with the highest number of points on their licence has in any given area and the number of drivers in any given area with more than 12 points on their licence
  • The number of people taking their test who required an interpreter to get through their driving tests.
  • Pass rates for men and women, different test centres and the person with the most failed attempts.

15. DWP

  • Employment Support Allowance acceptances by area
  • Appeal success rate for ESA by area

16. Charities Commission

  • Number of registered charities in the red

17. HMRC

  • The number of winding up orders issued by Her Majesty’s Revenue and Customs

18. Criminal Records Bureau

  • Crimes flagged on CRB checks on people applying for teaching positions

19. Government

  • Government/council spending on Facebook ads

January 6, 2013
by Claire

Starting with blogging again in 2013

My poor blog has been looking rather neglected. I haven’t really posted much for most of 2012, I’m hoping to change that a bit in 2013.

The second half of the year rather got away from me and I didn’t get as much done as I wanted to. I’m hoping to change that this year.

First thing that needs to be finished is the book I’m working about how  to get started with data journalism. It’s nearly finished, just a couple of stubborn chapters left to write (there is also the proofreading to do but it seems best not to think about that).

Other than that, I feel like I need to kick-start my learning when it comes to data journalism, there’s some new techniques I want to master, and I wanted to get Open Wales up and running again…and I have a start-up to launch at some point. So plenty to do.

Even though I failed to do much in the way of blogging, it appears people kept visiting. These were the most visited posts in the past year:

Putting it on a map – Google, BatchGeo and Fusion Tables

Huge list of data journalism resources

Putting it on a map – Longitudes and Latitudes

Combining Google Fusion Maps

How to make a heat map

November 6, 2012
by Claire
1 Comment

Google Fusion Heat Maps

Have completely failed to add a tutorial for making heat maps using Google Fusion Tables. To make this work, you have to first collect up some shapefiles – you can download them from open data sites, often as a KML, which can be directly uploaded or as shapefiles, which can be converted using Shape to FT.

The Middle Super Output Areas used below are available for download from the ONS. Alternatively download this Google Fusion Table and then upload the downloaded file to your own Google account.

To make a Fusion Table heat map, you will need a table containing your shapefiles and one containing your data.

For this example, I’m using the claimant count data, which gives an indication of the number of people claiming jobseeker’s allowance at low levels.

This can be downloaded from the  NOMIS site – clicking on advanced query brings up a tool that allows you to select various Department for Work and Pensions Statistics for different geographic areas.

The options are on the left-hand side of the screen, the easiest thing to do is work down the list selecting the variables you want.

For this example, I’ve selected claimants aged 24 and under in the Middle Super Output Areas in London.

Once you are happy with your selections, you can download the data as an Excel spreadsheet.

The spreadsheet needs some cleaning before it can be used. Firstly in order for it to be uploaded to Google Fusion Tables, the headers need to be in the first row.

Also, so the data can be combined with other data sets, the codes need to be in their own separate column.

This can be done by creating a new column to the left and adding the formula: =LEFT(B2,9), copying the first nine characters (the code part) into the new column.

Copy and paste the column, using Paste Special > paste values only, in the same place to get rid of the formula.

You can also tidy up the column with the name of the area in, to remove the part with the code, by adding a new column to the left, using =REPLACE(C2,1,12,“”).

Again copying and pasting values only will allow you to get rid of the formula, so you can delete the column with the codes and areas in.

While this data gives me the number of claimants looking for work, I want the proportion of young people are claiming jobseeker’s allowance for each area.

When making heat maps it is best to compare areas on a proportional scale, such as percentage or the number per 10,000 head of population, as different areas have different populations.

The population numbers for output areas can be downloaded from the ONS.

You can copy the figures for people aged under 24 into your original spreadsheet with the figures for claimants.

In order to make sure the claimants and the populations correspond to the same areas, sort the code columns for both. You can then check that they are lined up by comparing the code columns using =A2=E2

Add the columns for the number of people aged 15 to 19 and 20 to 24 to get a total population to compare claimants.

You can then work out the percentage of young people claiming jobseeker’s allowance in each area.


Save the file as a CSV and then upload to Google Fusion Tables.

Once the file has being uploaded, you can then merge it with the table containing the shapefile data using the codes to match up areas.

The new table will have a column containing the geometric information for each area – it will show up as a column with KML in italics for each matched area.

If you click on map, you will see the areas mapped out in red, which is the default.

In order to change the colours on the map, you need to select Tools > Change map styles.

You then need to select Polygons > Fill Colour and from there you have the option of buckets, which colour different areas different colours based on the value in the column selected, or gradient, which does the same, but colours areas based on a scale rather than distinct values.

It also helps to know how big a range of values you have – so from looking at the spreadsheet and sorting the percentage column from smallest to largest, I know the values go from 0% to 16.7%.

For this one, I have picked buckets and split it into four buckets

I have picked blue and orange for the colour scheme as people who are colour blind can still read it. Colorbrewer is a useful site for picking colour schemes for your heat map.

When you click save, the map should show up in the new colour scheme pretty instantly.

As you can see it is fairly dominated by the dark blue because a lot of the values are under 5%. One way to avoid this is to split the data into quartiles (or quintiles or deciles etc) so that there is a roughly equal number of areas of each colour – or you could pick out the bottom 10% of values and the top 10% and then split the remaining ones equally.

Doing this gives a better indicator of what areas are at the extremes and which are in the middle, rather than by splitting the values equally and possibly ending up with almost all the values grouped together and a few outliers.

This version of the map has the bottom 10% in dark blue, next 40% in light blue, next 40% in light orange and top 10% in dark orange

When you click on an area in the map, an info bubble comes up giving you information about that area.

To change what the info bubble says, click Tools > Change info window layout.

There are two options, a simple one where you can tick which columns you want to include data from.

Or one where you can combine data from the table with custom HTML. With this you can click on the information on the left to add in the data from a column for that particular area.

The benefit to this is you can create info bubbles which display the information about the data in sentences, making it easier to read and understand.


For more tips and ideas on working with data, buy Get Started with Data Journalism

April 16, 2012
by Claire

Data journalism as the answer to the story drought

I’m slowly catching up with reading and blogging again – seriously, stop paying attention for five minutes and you end up buried in interesting things.

Among the things I’ve finally caught up on is the MEN’s excellent Crime down your street series looking at the crime map figures for the whole of 2011 in Manchester (I’d been looking out for it since Paul Gallagher mentioned he was working on the figures).

What the series is great for illustrating is the way in which data journalism can be used to generate a group of connected stories off the back of a single data source, which can be planned for the paper over a number of days (potentially a good way to get people to read the website/buy the paper over several days, plus offering some guaranteed copy

The articles themselves are really well put together, with maps, tables and case studies (I hate finding case studies, but yes, they do help bring a story to life), all of which is easier to produce if you’re not trying to do it right on deadline (this is totally the reason why I hate finding case studies).

It requires a commitment to forward planning and putting time aside to work on such stories – not always the easiest, given both the amount of time it takes to put together some pieces of data journalism (‘the longest I ever spent cleaning data’ is a future game of one-upmanship waiting to happen) and the in-the-moment demands of daily newspapers.

But the pay off is good – I’m not sure how the stories appeared in the paper but they look like the provided several spreads, but the online engagement is very high, the number of comments suggests they were amongst the most commented of the day (I can’t spot any numbers for Twitter/Facebook shares)

I tend to think one of the best arguments for data journalism in the newsroom is it generates stories – as this series demonstrates one piece of work can generate several good strong off-diary stories, the kind of thing people would much rather read over a rehashed press release, making it well worth the effort.

March 13, 2012
by Claire
1 Comment

FOI is better than tea and biscuits!

The Cardiffian, Cardiff School of Journalism’s hyperlocal website, published a story earlier about Freedom of Information requests to Cardiff Council – nearly 4,000 in the past three years (most of those aren’t mine, honest).

There was whinging about the value of the act, I got irritated, there were tweets.

The negative comments are like a short re-hash of most of the complaints made about FOI – see the Save FOI campaign for more examples  – they are also rubbish arguments.

Cllr Robert Smith’s view is FOI is eating into budget, so we should ignore “trivial” requests – who gets to decide what a trivial request is?

From Cardiff Council’s budget in 2010/11, the FOI department cost £518,000 to run.

This to be fair doesn’t involve costs to other departments for searching for information for the FOI department, as the Cardiffian article points out this hasn’t been quantified.

Meanwhile from the same budget, here are some spending highlights (as mentioned in my last FOI post):

  • £113,000 on refreshments for meetings
  • £124,000 on trees and plants
  • £6.1m paying events promoters
  • £390,000 for hiring musicians
  • £500,000 on bank charges
  • £2.4m on consultants

Those are just the ones I could find in my notebook in a couple of minutes (there’s probably more).

I realise if you’re a councillor, tea and biscuits sounds much more appealing than transparency and being held accountable and actually having to answer to voters, but those things are what you signed up to when you stood for election.

Given all this, is it really any surprise that people might want to use the FOI act to see where their money is going?

Oddly enough I think the £518,000 on FOI, a whopping 0.6% of the total budget, is an excellent use of Cardiff council’s budget.

Personally as a Cardiff council taxpayer, I’m completely supportive of any move to spend the bickie fund on something a bit more useful…like transparency.

On Cllr Rob McKerlich’s point about FOI replies not being perfect, this is usually the point I stop worrying about FOI requests and start worrying about the ability of the council to do it’s job properly given it’s clearly terrible data management systems.

As a councillor, Cllr McKerlich may want to look into how that can be sorted out.

Apparently, one of the 2012 requests is about parking tickets – wouldn’t it be so much easier just to publish all the data once a month and let people go through it for whatever they wanted.

I tend to think ‘if someone keeps asking for the same thing, try proactively releasing it’, might be good words to live by.

Sarah Hartley makes a good point that open data is a good thing but not at the expense of FOI, which lets people ask for the information they want and need.

If people are making more requests, doesn’t it suggest more engaged citizens, media that’s doing it’s job, councillors who can’t get the information some other way…or generally democracy in action, something I’m sure all councillors would be happy to raise a mug of tea to.