Python Project: UK Map of Football Teams (Pt.1)

Back in 2015 my local football club, AFC Bournemouth, were promoted to the English Premier League, the highest tier of English football. This year, after 5 years in the league, we were relegated back to the 2nd tier of English football, called the EFL Championship.

I was upset at the prospect of no longer watching my team play the most elite clubs in England, but I thought to myself how can I turn this into a positive?

“Ooo, maybe I could do something as a Python project?”

After a few evenings of tinkering around, I had built the interactive map you see below.

efl_mapping.gif

I realise I may have lost all non-football fans by now, but in this series of blog posts I want to share my approach to this project. From my personal experience in learning Python, I’ve found hundreds and hundreds of courses that teach you Python fundamentals, concepts, libraries and so on, but what is lacking is examples of projects. I find you can draw a lot of inspiration from how people write their own code and how they use Python to build projects that interest them.

So consider this my attempt to fill that educational gap! This project covers A LOT of different skills that may be useful for people learning about Python, Data Science or Data Analysis! So stick with me, it’s not all football!


Topics Covered

UK Map of Football Teams (Part 1)

  • Project Overview

  • Data Collection

    • Web Scraping from Wikipedia with BeautifulSoup

    • Google Maps API (Directions)

    • Getting a map of the UK with GeoJSON data and Geopandas

UK Map of Football Teams (Part 2) (TBD)

  • Data Manipulation

    • Geopandas -> A Python library for working with geospatial data.

UK Map of Football Teams (Part 3) (TBD)

  • Visualisation

    • Interactive matplotlib plots


Project Overview


When working on a new Python project, I have learnt from experience that it always helps to have at least some sort of “goal” or “aims” to achieve. When I haven’t done this in the past, I can find myself just aimlessly tinkering around with various bits of data and I never really produce anything of value. Which is fine, but I think it helps to have a goal in mind, especially if it’s something you want to put on your Github or CV to demonstrate to others.

So what were my aims? Ultimately I wanted to be able to visually see which football clubs my team would be playing in the 2020/21 season and importantly where they are on the map in relation to my location. That way, I can play which matches I could reasonably travel to on a free weekend to watch them live!

Translating this brief into some rough bullet aims…

  • A map of the UK

  • Geographical data collected on each EFL Championship football team

  • Each EFL Championship football team plotted on the map

  • Each team location is clickable and presents useful information (e.g. stadium, distance away from origin)

Part 1: Data Collection


Web Scraping from Wikipedia with BeautifulSoup

If you’d rather watch video check it out here ^

First things first, I needed to get a list of all the EFL Championship teams, and their stadium names. This is sufficient because later on I can use the Google Maps API to retrieve the stadiums geographical location data (latitude and longitude).

A quick Google search later I found this handy table on Wikipedia.

efl_football_teams_wikipedia

To retrieve this information with Python, I made a function scrape the link (the wikipedia article) and then return me a pandas DataFrame, which is the de facto Python library for data manipulation and analysis in Python.

def get_efl_team_data() -> pd.DataFrame:
    """
    Function pulls the latest EFL team data from wikipedia table.

    :return: pandas DataFrame of the EFL team data
    """
    wiki_html = requests.get(
        "https://en.wikipedia.org/wiki/EFL_Championship"
    ).text
    soup = BeautifulSoup(wiki_html, 'lxml')
    table_html = soup.find("table",{"class":"wikitable sortable"})
    df = pd.read_html(str(table_html))[0]
    df = df[['Club', 'Stadium']]
    
    return df

Breaking down what’s happening here:

  • Use the requests library to “get” the raw html of the wikipedia article

  • We then use the BeautifulSoup library to “parse” in the html data. Essentially, it structures the html code so you can easily search for tags and items.

  • Search the BeautifulSoup object for a table in the article

  • Utilise pandas’ read_html function to read the HTML table code and produce a pandas.DataFrame

  • Filter our DataFrame to the information we care about: the Club, and the Stadium

The result is we can easily produce the list of EFL Championship Football teams. In addition, assuming that this Wikipedia article is periodically updated, then next season, we will get the latest set of teams in the league

efl_team_data_function.png

Google Maps API (Directions)

If you don’t know what an API is, I’d suggest reading FreeCodeCamp’s Article. In my own words, it is a way to programatically speak to a website or an application and get a structured response.

For this project, I utilised the Google Maps API, specifically the Directions API, which allows me to send a query with an origin, and a destination, and I will receive a TONNE of useful information related to the directions between them.

To get working with the Google Maps Directions API, you need to do a few quick things:

  1. Navigate to the Google Maps API site and sign-up for an account (Note: They now require a credit/debit card to attach to your account, but with the way it’s structured, you’d have to make A LOT of requests before using the APIs actually start costing you money)

  2. Follow the steps to create a project and generate an API key for the Google Maps API… it will look something a bit like this:

apikey.png

Now we have an API key, we can run queries against the API in our Python function:

def get_directions(origin: str, destination: str):
    """
    This function will take in an origin and a destination, query the Google
    Maps API and then return the json response in dictionary form.

    :param origin: The start location from where you will be travelling from.
    :param destination: The location you want to travel to.
    :return:
    """
    API_KEY = "YOUR_API_KEY"
    endpoint = "https://maps.googleapis.com/maps/api/directions/json?"
    nav_request = f"origin={origin}&destination={destination}&key={API_KEY}"
    request = endpoint + nav_request
    response = urllib.request.urlopen(request).read()
    directions = json.loads(response)
    
    return directions

Once again, breaking down what’s happening in this function:

  • Build the “endpoint” string which is essentially the standard structure of the API query

  • Use an f-string to build the “nav_request” portion of the API query. You’ll see its literally just a specification of some variables (origin, destination and API key).

  • Use requests to run this query and read the response

  • We then capture the json response and return it under the variable “directions”

So we now have a function that takes in an origin (i.e. where you currently are) and a destination (this could one of the football clubs). Let’s run it now, and pretend I’m going from Nottingham to Bournemouth.

directions.png

Okay WOAH! That is a lot of information. The data that is returned is of json format, which in Python is basically just a dictionary. A set of key-value pairs, but often with a lot of nesting!

The best way to figure out what data you’re seeing is a combination of reading the API’s documentation, but also just playing around with it.

After a while of playing, I learnt that I could get the 2 useful bits of data I needed:

def get_distance(directions):
    return directions['routes'][0]['legs'][0]['distance']['value']
def get_lat_long(directions):
    end_location = directions['routes'][0]['legs'][0]['end_location']
    latitude = end_location['lat']
    longitude = end_location['lng']
    return (latitude, longitude)

Continuing our example of Nottingham -> Bournemouth, let’s see how far away Bournemouth is, and also get its latitude and longitude…

Okay one final thing you’ll notice, that distance of “313988”, one can intuit, or read the documentation, and realise that this is the distance in metres. It probably makes sense to use kilometres since we’re working with locations across the UK, and so I wrote a function to convert this distance to km and round upward.

def convert_distance_m_to_km(distance):
    distance_km = ceil(distance / 1000)
    return distance_km

Getting a map of the UK with GeoJSON data

Okay now that we have the team data, we have the mapping data, we need a map of the UK! To do this we utilise a Python library called Geopandas. It is built on top of the pandas library, but has added functionality so that you can plot geographical data.

To use Geopandas at the most basic level, all you need is a set of coordinates, or “latitudes and longitudes”. Luckily for us, there are public datasets that can give us this data in a dataformat called “geojson”.

A quick Google search for “UK Map GeoJson” and it turns out our government provides datasets for this. How kind! Here’s a link to the page. (Note: it is a fairly sizable file ~60MB)

After you’ve downloaded this it is super easy to read into a geopandas DataFrame, and then plot:

How cool is that!?

How cool is that!?

We now have all the data we need to pull together and make our interactive, map of EFL Championship teams! I’ll cover this in part 2 and part 3.


Stay happy, keep learning

Previous
Previous

Wake Up Fresh with the Lumie Bodyclock

Next
Next

Review Anki with your PS4 Controller