A Spanish version of this article has been published on pybonacci.org.


Since quite some time I wanted to learn Pandas. Finally I grabbed a recent opportunity: Brexit = data.

As usual I started with a practical exercise / objective. I used Pandas to analyze the CSV data published by electoralcommission.org.uk.

Although I wanted to answer more questions, this CSV was enough to get my feet wet with Pandas (it’s huge!). Moreover I learned to use Jupyter notebook to document my progress. You can see / download the notebook at Github.

I achieved my goal of representing the data shown here. Here are some screenshots of the notebook:


notebook_screenshot1


notebook_screenshot2


notebook_screenshot3


notebook_screenshot4


Adding demographics

I matched the data with publicly available census data (as suggested by Pybonacci, thanks). I found some cool correlations (and learned quite some matplotlib along the way), see the full notebook here:

How does age influence the leave / remain vote?

median_age.png

How does unemployment influence?

perc_unemployed.png

How does higher education influence?

perc_high_education.png

And how does being born outside UK influence?

perc_born_outside_uk.png

Clearly elderly people and areas with a higher unemployment rate tend to vote for “Leave”. On the other hand, areas with a higher percentage of highly educated people, and regions with more people born outside the UK, (generally) prefer that UK remains in the EU.

Again, to see how I got to these results with Pandas you can see the full notebook here.

And last but not least: income data by region

Income data was harder to get from the standard census data so I used this link to check the relation of median income on voting. I found an interesting pattern:

median_income.png

(the parsing of the data is documented in the same notebook)

We can clearly see that regions with a relatively smaller median income are more in favor of leaving the EU, although it is not 100% consistent: Northen Ireland has a lower median income but voted Remain, South East has a higher median income but wants to leave. Interesting though how general trends become visible by merging different data sets.

Reference links to learn Pandas


Bob Belderbos

I build useful apps and share my learning About me