Data Management and Visualization Week 3

This is the third in a series of posts chronicling my project for the Data Management and Visualization course.  This week we learned about making data management decisions.

I put together a Jupyter notebook which is hosted on my GitHub project repository. It is an example of literate programming so it mixes narrative content with machine readable code. If you want to view the Python script sans narration it is available too.

Project Description

In this analysis I would like to examine the relationship between the economic well-being of a society and the level of democratization.  The data for this analysis comes from a subset of the GapMider project data.

Data Management Decisions

Democracy Score

The level of democratization is measured using the Polity IV democracy score.  It is a summary measure of a country’s democratic and free nature or lack thereof.  It ranges from -10 (an autocracy) to 10 (full democracy).  This gives 21 possible values which is a little overwhelming.  I decided to create five categories (Full Democracy, Democracy,Open Anocracy, Closed Anocracy and Autocracy).  These categories are from the Polity IV project authors.

Economic Well-Being

The other variable that needed data management is the per capita GDP variable.  This is a continuous variable that I am thinking about changing into a discrete variable.  I broke the data into per capita GDP quartiles and quintiles.

Subset of Full Data Set

There are 213 countries in the GapMinder data set.  161 out of 213 have democracy scores (52 missing).  190 out of 213 have per capita GDP (23 missing).  I selected the countries that had both per capita GDP and democracy scores.  This left me with 155 countries.

Exploratory Analysis

This section summarizes some of the findings of my exploratory analysis.

Countries by Continent

I was interested in seeing if I should include a geographic variable in the analysis so I created a dataset that took the GapMinder countries and mapped them into their continents.

Full Democracy Democracy Open Anocracy Closed Anocracy Autocracy
Africa 1 16 8 20 4
Asia 3 10 6 5 13
Australia and Oceania 2 1 1 1 0
Europe 20 15 2 0 2
North America 4 8 1 1 1
South America 2 7 1 1 0

Europe has the highest concentration (about 90%) followed by North America (85%). Asia and Africa had the lowest concentrations (both roughly 35%). Asia has 13 countries that are autocracies. South America and Australia and Oceania are not found in this column.

The Number of Countries by Income Quartile and Level of Democracy

Full Democracy Democracy Open Anocracy Closed Anocracy Autocracy
0% to 25% 9 53 19 26 16
25% to 50% 8 2 0 0 2
50% to 75% 9 2 0 0 1
75% to 100% 6 0 0 1 1

Most are in the lowest quartile. 6 of the 8 countries in the top income quartiles are full democracies.

The Number of Countries by Income Quintile and Level of Democracy

Full Democracy Democracy Open Anocracy Closed Anocracy Autocracy
Lowest (0% to 20%) 7 53 19 25 15
Second (20% to 40%) 9 1 0 1 3
Middle (40% to 60%) 2 2 0 0 1
Fourth (60% to 80%) 9 1 0 0 0
Highest (80% to 100%) 5 0 0 1 1

Most countries are in the lowest gdp per capita quintile. 34% of the countries in the data set are both democracies and are in the lowest per capita GDP quintile. Most of the countries in the highest quintile are full democracies.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s