Python Project: UK Map of Football Teams (Pt.1)
Back in 2015 my local football club, AFC Bournemouth, were promoted to the English Premier League, the highest tier of English football. This year, after 5 years in the league, we were relegated back to the 2nd tier of English football, called the EFL Championship.
I was upset at the prospect of no longer watching my team play the most elite clubs in England, but I thought to myself how can I turn this into a positive?
“Ooo, maybe I could do something as a Python project?”
After a few evenings of tinkering around, I had built the interactive map you see below.
I realise I may have lost all non-football fans by now, but in this series of blog posts I want to share my approach to this project. From my personal experience in learning Python, I’ve found hundreds and hundreds of courses that teach you Python fundamentals, concepts, libraries and so on, but what is lacking is examples of projects. I find you can draw a lot of inspiration from how people write their own code and how they use Python to build projects that interest them.
So consider this my attempt to fill that educational gap! This project covers A LOT of different skills that may be useful for people learning about Python, Data Science or Data Analysis! So stick with me, it’s not all football!
Topics Covered
UK Map of Football Teams (Part 1)
Project Overview
Data Collection
Web Scraping from Wikipedia with BeautifulSoup
Google Maps API (Directions)
Getting a map of the UK with GeoJSON data and Geopandas
UK Map of Football Teams (Part 2) (TBD)
Data Manipulation
Geopandas -> A Python library for working with geospatial data.
UK Map of Football Teams (Part 3) (TBD)
Visualisation
Interactive matplotlib plots
Project Overview
When working on a new Python project, I have learnt from experience that it always helps to have at least some sort of “goal” or “aims” to achieve. When I haven’t done this in the past, I can find myself just aimlessly tinkering around with various bits of data and I never really produce anything of value. Which is fine, but I think it helps to have a goal in mind, especially if it’s something you want to put on your Github or CV to demonstrate to others.
So what were my aims? Ultimately I wanted to be able to visually see which football clubs my team would be playing in the 2020/21 season and importantly where they are on the map in relation to my location. That way, I can play which matches I could reasonably travel to on a free weekend to watch them live!
Translating this brief into some rough bullet aims…
A map of the UK
Geographical data collected on each EFL Championship football team
Each EFL Championship football team plotted on the map
Each team location is clickable and presents useful information (e.g. stadium, distance away from origin)
Part 1: Data Collection
Web Scraping from Wikipedia with BeautifulSoup
First things first, I needed to get a list of all the EFL Championship teams, and their stadium names. This is sufficient because later on I can use the Google Maps API to retrieve the stadiums geographical location data (latitude and longitude).
A quick Google search later I found this handy table on Wikipedia.
To retrieve this information with Python, I made a function scrape the link (the wikipedia article) and then return me a pandas DataFrame, which is the de facto Python library for data manipulation and analysis in Python.
def get_efl_team_data() -> pd.DataFrame: """ Function pulls the latest EFL team data from wikipedia table. :return: pandas DataFrame of the EFL team data """ wiki_html = requests.get( "https://en.wikipedia.org/wiki/EFL_Championship" ).text soup = BeautifulSoup(wiki_html, 'lxml') table_html = soup.find("table",{"class":"wikitable sortable"}) df = pd.read_html(str(table_html))[0] df = df[['Club', 'Stadium']] return df
Breaking down what’s happening here:
Use the
requests
library to “get” the raw html of the wikipedia articleWe then use the
BeautifulSoup
library to “parse” in the html data. Essentially, it structures the html code so you can easily search for tags and items.Search the
BeautifulSoup
object for a table in the articleUtilise pandas’
read_html
function to read the HTML table code and produce a pandas.DataFrameFilter our DataFrame to the information we care about: the Club, and the Stadium
The result is we can easily produce the list of EFL Championship Football teams. In addition, assuming that this Wikipedia article is periodically updated, then next season, we will get the latest set of teams in the league
Google Maps API (Directions)
If you don’t know what an API is, I’d suggest reading FreeCodeCamp’s Article. In my own words, it is a way to programatically speak to a website or an application and get a structured response.
For this project, I utilised the Google Maps API, specifically the Directions API, which allows me to send a query with an origin, and a destination, and I will receive a TONNE of useful information related to the directions between them.
To get working with the Google Maps Directions API, you need to do a few quick things:
Navigate to the Google Maps API site and sign-up for an account (Note: They now require a credit/debit card to attach to your account, but with the way it’s structured, you’d have to make A LOT of requests before using the APIs actually start costing you money)
Follow the steps to create a project and generate an API key for the Google Maps API… it will look something a bit like this:
Now we have an API key, we can run queries against the API in our Python function:
def get_directions(origin: str, destination: str): """ This function will take in an origin and a destination, query the Google Maps API and then return the json response in dictionary form. :param origin: The start location from where you will be travelling from. :param destination: The location you want to travel to. :return: """ API_KEY = "YOUR_API_KEY" endpoint = "https://maps.googleapis.com/maps/api/directions/json?" nav_request = f"origin={origin}&destination={destination}&key={API_KEY}" request = endpoint + nav_request response = urllib.request.urlopen(request).read() directions = json.loads(response) return directions
Once again, breaking down what’s happening in this function:
Build the “endpoint” string which is essentially the standard structure of the API query
Use an f-string to build the “nav_request” portion of the API query. You’ll see its literally just a specification of some variables (origin, destination and API key).
Use requests to run this query and read the response
We then capture the json response and return it under the variable “directions”
So we now have a function that takes in an origin (i.e. where you currently are) and a destination (this could one of the football clubs). Let’s run it now, and pretend I’m going from Nottingham to Bournemouth.
Okay WOAH! That is a lot of information. The data that is returned is of json format, which in Python is basically just a dictionary. A set of key-value pairs, but often with a lot of nesting!
The best way to figure out what data you’re seeing is a combination of reading the API’s documentation, but also just playing around with it.
After a while of playing, I learnt that I could get the 2 useful bits of data I needed:
def get_distance(directions): return directions['routes'][0]['legs'][0]['distance']['value']
def get_lat_long(directions): end_location = directions['routes'][0]['legs'][0]['end_location'] latitude = end_location['lat'] longitude = end_location['lng'] return (latitude, longitude)
Continuing our example of Nottingham -> Bournemouth, let’s see how far away Bournemouth is, and also get its latitude and longitude…
Okay one final thing you’ll notice, that distance of “313988”, one can intuit, or read the documentation, and realise that this is the distance in metres. It probably makes sense to use kilometres since we’re working with locations across the UK, and so I wrote a function to convert this distance to km and round upward.
def convert_distance_m_to_km(distance): distance_km = ceil(distance / 1000) return distance_km
Getting a map of the UK with GeoJSON data
Okay now that we have the team data, we have the mapping data, we need a map of the UK! To do this we utilise a Python library called Geopandas. It is built on top of the pandas library, but has added functionality so that you can plot geographical data.
To use Geopandas at the most basic level, all you need is a set of coordinates, or “latitudes and longitudes”. Luckily for us, there are public datasets that can give us this data in a dataformat called “geojson”.
A quick Google search for “UK Map GeoJson” and it turns out our government provides datasets for this. How kind! Here’s a link to the page. (Note: it is a fairly sizable file ~60MB)
After you’ve downloaded this it is super easy to read into a geopandas DataFrame, and then plot:
We now have all the data we need to pull together and make our interactive, map of EFL Championship teams! I’ll cover this in part 2 and part 3.
Stay happy, keep learning