clairemiller.net

Data Journalism for Beginners: Some maths and Excel basics.

| 2 Comments

Part of a series of posts looking at tips and ideas for getting started with data journalism

May 1 – why data journalism and getting started
May 2 – finding your data
May 3 – cleaning it up
May 4 – mapping and visualising

Want more? Buy Getting Started with Data Journalism, a complete beginners guide to finding, cleaning analysing and visualising data in any size newsroom

It's all about the spreadsheets

It’s all about the spreadsheets

Some basic maths

The most useful thing for analysing data is a few simple sums.

Usually when we’re looking for a story, we only use a few basic sums to analyse things – mostly to even out variations in population size, to make better comparisons, or to see what’s changed from year to year.

Percentage of

=(value/total)*100

Useful if you want to calculate how much a figure is in comparison to a total amount.

For example, you might want to calculate how much money is spent on supply teachers as a percentage of the total school budget or, how many prisoners have been identified as self-harming from the prison’s total population.

Percentage change

=((New – Old)/Old)*100

If you have a dataset where you have data for two different time periods, one of the things you can look at is whether there has been a change between those two periods.

Per 100/1,000/10,000 population

=(number/total population)*100 (or 1,000 or 10,000)

This is the best way to compare figures for areas with vastly different size populations – otherwise it is fairly likely that areas with lots of people will have more of what ever you are trying to measure.

Equivalent to how many – days lost, per minute

Another way to show how often something happens is to calculate it in units that people can grasp – days, weeks, minutes etc.

For example, 65,000 parking tickets issued in a year is a lot, divided by 365, it is 178.1 a day.

You can then divide it by the 24 hours in a day, or more realistically, the number of hours in the working day for traffic wardens, e.g. 7am to 11pm = 16 hours, so 178.1 divided by 16 equals 11.1 tickets an hour.

From there you could work out how often a ticket is issued, on average in minutes – 60mins divided by 11.1 equals one every 5.4mins – round to five minutes and you potentially have a figure to start your story with.

Mean and median

The mean is another way of describing what is generally known as the average.

To get the mean, you need to add up all of the values in the group you want to find the average for and then divide it by the total number of items in the group.

Sometimes, averages can be misleading – for example, when looking at average wages, a small number of people who earn very high wages can skew the average so it is higher than a number that reflects what most people earn.

In these instances, using the median, or the mid point of the numbers, may be more accurate. To get the median, you would line up all the numbers in ascending order and then pick the number that falls in the middle of the sequence.

Quartiles, quintiles and deciles are calculated in a similar way to the median. Quartiles are found by dividing the group into four equal groups, quintiles, five and deciles, 10.

Spreadsheet Basics

Working with your data in a spreadsheet can be a good way to speed up your analysis by helping you to perform lots of calculations quickly.

=SUM

If you want to know the total of a column or row of numbers, this is the formula you need.

To calculate the total, type =SUM into an empty cell. You can then drag a box around the cells you want to add together. If you want to add several unconnected cells or groups of cells you can add them to the same sum by separating them with commas.

In Microsoft Excel, clicking on the Sigma symbol in the tool bar will highlight nearby cells so you can total them. If you want to change the cells highlighted drag the box to the area you want.

=AVERAGE

Just as knowing the total amount spent can be helpful, so can knowing the average amount spent.

The formula works in much the same way as =SUM.

In some cases, using the median, or the mid-point in the data where 50% of values are higher and 50% are lower, may be a better indicator.

You can work out the median in the same way as the average by starting your formula =MEDIAN.

=MIN, =MAX

One way to find out the minimum and maximum amount in a column or row of figures is to use the above formulas.

However, if you want to know which item in the list is the minimum or the maximum, and any additional connected data, you are going to have to scroll back through the whole list to find it. This may not be practical with a very long list, so this may not be the best technique for finding out the biggest and smallest numbers.

=COUNT

If you want to know how many items are in a list or part of a list, the count formula can be really helpful.

There are a number of different options for using count:

=COUNT will count all of the cells in the given range that have numbers in.

=COUNTA counts all the cells that have something (text or numbers) in

=COUNTBLANK counts all the cells that are blank.

You can also add conditions to the counts so only certain cells are included in the total. For example, =COUNTIF can be used with conditions such as >, more than, <, less than, or =, equal to.

Want more? Buy Getting Started with Data Journalism, a complete beginners guide to finding, cleaning analysing and visualising data in any size newsroom

2 Comments

  1. Pingback: Data Viz News [53] | Visual Loop

  2. Pingback: Data Viz News [53] - sqoops

Leave a Reply

Required fields are marked *.