Import.io is a data scraping tool that is designed to make collecting data from websites easy to accomplish. So let's give it a go...
First you will need to download Import.io. It's free and you can get it here.
Once it is downloaded, open it up. It will look like this:
For now, we will be working with the extractor tool, so click on that. This tool will allow us to scrape data from a table on a website. I have drawn a big red circle around it in the picture so you can find it. You're welcome.
Next, paste the URL of the site you want to scrape into Import.io at the top of the page, and it will load it up. I have chosen a page with a list of every premier league winner on it. It looks like this:
As you can see, the website loads up, and you get a few extra options at the bottom. Choose extractor again.
Import.io will ask you whether you can see the data on the page. Go through the options, and choose the table option.
It will then allow you to select the table with one click. It's as simple as that. It will put the data in a table for you, which will look like this:
Now, Import.io can be used to get much more sophisticated than extracting from a single table. So next it will ask you if there are other pages that have the same format that you want to scrape. For now, we're just going to work with this one page, so click, 'I've got what I need'.
In this example, you are going to need to tell the software that the team is text and not a link. To do that, just click on the column and a menu appears. You can change it from there.
Next click 'show me the data' and it will present you with your new data table. It will look like this:
Click on the download button in the top left, and select an excel file.
Now you can play around with your data. For example, I made a chart showing the premier league's top ten clubs in terms of number of times they've won the league:
http://cf.datawrapper.de/fk6x4/1/
This was created using datawrapper, a free graph tool.
Interestingly, neither Manchester City nor Tottenham are in the top 10, despite both being successful teams today.
No comments:
Post a Comment