A Spanish version of this article has been published on pybonacci.org.


Since quite some time I wanted to learn Pandas. Finally I grabbed a recent opportunity: Brexit = data.

As usual I started with a practical exercise / objective. I used Pandas to analyze the CSV data published by electoralcommission.org.uk.

Although I wanted to answer more questions, this CSV was enough to get my feet wet with Pandas (it’s huge!). Moreover I learned to use Jupyter notebook to document my progress. You can see / download the notebook at Github.

I achieved my goal of representing the data shown here. Here are some screenshots of the notebook:


notebook_screenshot1


notebook_screenshot2


notebook_screenshot3


notebook_screenshot4


Adding demographics

I matched the data with publicly available census data (as suggested by Pybonacci, thanks). I found some cool correlations (and learned quite some matplotlib along the way), see the full notebook here:

How does age influence the leave / remain vote?

median_age.png

How does unemployment influence?

perc_unemployed.png

How does higher education influence?

perc_high_education.png

And how does being born outside UK influence?

perc_born_outside_uk.png

Clearly elderly people and areas with a higher unemployment rate tend to vote for “Leave”. On the other hand, areas with a higher percentage of highly educated people, and regions with more people born outside the UK, (generally) prefer that UK remains in the EU.

Again, to see how I got to these results with Pandas you can see the full notebook here.

And last but not least: income data by region

Income data was harder to get from the standard census data so I used this link to check the relation of median income on voting. I found an interesting pattern:

median_income.png

(the parsing of the data is documented in the same notebook)

We can clearly see that regions with a relatively smaller median income are more in favor of leaving the EU, although it is not 100% consistent: Northen Ireland has a lower median income but voted Remain, South East has a higher median income but wants to leave. Interesting though how general trends become visible by merging different data sets.

Reference links to learn Pandas


Bob Belderbos

Software Developer, Pythonista, Data Geek, Student of Life. About me