Saturday, 26 April 2014

Problem solving with spreadsheets: how to turn a list into a running total (and have a laugh at David Moyes' expense)

As a follow up to my post about scraping websites using Import.io where I scraped data about the most successful premier league football teams, I decided to make the data tell a better story than a simple bar chart of total league titles, which looked like this:




In order to make this data tell a story, I wanted to see which teams had the most at any given time, so I could answer questions like 'who had the most titles in 1962?' 

To do this, I had to fiddle with my data in the Google spreadsheet a little. This is what my data looked like: 




First of all I deleted all columns except for the 'year' and 'Winner' column.

Then I sorted the sheet by year, so that it started in 1889. The data looks like this:


Now I want to turn the league wins of each team into '1's, so they are easier to count. So I need a new column for these ones and zeroes. This is the formula that goes into that column in the cell C2: =COUNTIF(B2:B2,"*Arsenal FC*")

All this formula does is tell the spreadsheet that for each year, if Arsenal win, they get a '1' and if they don't, they get a '0'.

To make this apply to the whole column, just drag the little dot in the bottom right corner of cell all the way to the bottom of the spreadsheet. The sheet should now look like this:



Now we want to add these up as they come, so that for each year, we have a running total.

This is the formula I used in my next column to achieve this: =sum(D2+C3) and it looks like this:

As you can see from the picture, this formula tells the spreadsheet that for this column, I want the total (the SUM) of the cell above, (in this case D2, which is the carried over total from the year before) added to cell C3, which represents the current year's results. To make this apply to all the cells, just double click the dot in the corner of the cell, and it will continue to the bottom of the sheet.


To check if this is correct, I went to the bottom of the table, to check the last figure was in fact the correct total wins for that team.

To apply this to the other teams, you just need to copy and paste what you did, replacing 'Arsenal' in the first formula with whichever team you are counting next.

So I did this with the four most successful teams. Then I copied the whole thing, and selected 'paste special' to put it into a new sheet. This just gets rid of the formulas and replaces them with the numbers you have. Delete all excess columns, and it will look like this:
Now you could use Google spreadsheets or Excel to make this into a line graph, but I want a bit of mouseover interactivity, so I used Datawrapper. Just copy and paste the whole spreadsheet into data wrapper and it does the rest for you:


Within data wrapper, you can change the width and height of the chart when you are copying the embed code so that if you are publishing your chart onto a blog like I am doing, you can make it nice and big.

This way of displaying data enables us to tells a lot more stories than just who has the most.

For example, just by looking at the graph we can say that for most of the lives of current football fans, Liverpool dominated the league as the most successful team all throughout the 70s and 80s.

We can also see that the only time Everton were leading the pack of these four teams was way back in the 19th century when the league was young.

Arsenal were dominant for a period in 40s and 50s, and the only time Manchester were on top was with their recent success, which has made them the most successful English team of all time. (That period may have come to an end however if recent results are anything to go by.)

So, farewell David Moyes, football fans everywhere (apart from parts of Manchester) are sad to see you go :(




Sorry, I couldn't resist!

If you know a faster way of building a running total from a list, let me know in the comments!


No comments:

Post a Comment