Regression Modeling in Practice Week 2 Assignement


For week two of Regression Modeling in Practice I preformed a simple regression. My study question in this course is the relationship between democratic openness and economic well-being. The GapMinder data set was used. Economic well-being was measured using GDP percapita. Democratic openness is measured using the score from the Polity IV project.

Analysis Source Code

155 countries were categorized into two categories: full democracy (n=32) and not full democracy (133). Then the relationship between economic well-being being dependent on the presence of full democratization was measured using an OLS regression. The analysis was preformed using the following Python code:

# Import libraries needed
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf

# Read in the Data
df = pd.read_csv('gapminder.csv', low_memory=False)

# Print some basic statistics
n = str(len(df))
cols = str(len(df.columns))
print('Number of observations: '+ n +' (rows)')
print('Number of variables: '+ cols +' (columns)')

# Change the data type for variables of interest
# Response Variable
df['incomeperperson'] = pd.to_numeric(df['incomeperperson'], errors='coerce')
# Explanatory Variable
df['polityscore'] = pd.to_numeric(df['polityscore'], errors='coerce')

# Print the number of records with data
print ('Countries with a GDP Per Capita: ' + str(df['incomeperperson'].count()) + ' out of ' + str(len(df)) + ' (' + str(len(df) - df['incomeperperson'].count()) + ' missing)')
print ('Countries with a Democracy Score: ' + str(df['polityscore'].count()) + ' out of ' + str(len(df)) + ' (' + str(len(df) - df['polityscore'].count()) + ' missing)')

# Get the rows not missing a value
subset = df[np.isfinite(df['polityscore'])]
subset = subset[np.isfinite(subset['incomeperperson'])]
print('Number of observations: '+ str(len(subset)) +' (rows)')

# This function converts the polity score to a binary category flag
def is_full_democracy(score):
if score == 10:

# Now we can use the function to create the new variable
subset['is_full_democracy'] = subset['polityscore'].apply(is_full_democracy)

# Create frequency table
full_democracy_counts = subset.groupby('is_full_democracy').size()

# Create simple regression model
model = smf.ols('incomeperperson ~ is_full_democracy', data=subset).fit()

This code is also available in my GitHub repository for this class.

Regression Model Results

The results of the linear regression model indicated that the presence of a full-democracy (Beta=15,990, p=0.000) was significantly and positively associated with the economic well-being. The adjusted R2 of for this model is 0.439.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s