For week two of Regression Modeling in Practice I preformed a simple regression. My study question in this course is the relationship between democratic openness and economic well-being. The GapMinder data set was used. Economic well-being was measured using GDP percapita. Democratic openness is measured using the score from the Polity IV project.
Analysis Source Code
155 countries were categorized into two categories: full democracy (n=32) and not full democracy (133). Then the relationship between economic well-being being dependent on the presence of full democratization was measured using an OLS regression. The analysis was preformed using the following Python code:
# Import libraries needed import pandas as pd import numpy as np import statsmodels.formula.api as smf # Read in the Data df = pd.read_csv('gapminder.csv', low_memory=False) # Print some basic statistics n = str(len(df)) cols = str(len(df.columns)) print('Number of observations: '+ n +' (rows)') print('Number of variables: '+ cols +' (columns)') print('\n') # Change the data type for variables of interest # Response Variable df['incomeperperson'] = pd.to_numeric(df['incomeperperson'], errors='coerce') # Explanatory Variable df['polityscore'] = pd.to_numeric(df['polityscore'], errors='coerce') # Print the number of records with data print ('Countries with a GDP Per Capita: ' + str(df['incomeperperson'].count()) + ' out of ' + str(len(df)) + ' (' + str(len(df) - df['incomeperperson'].count()) + ' missing)') print ('Countries with a Democracy Score: ' + str(df['polityscore'].count()) + ' out of ' + str(len(df)) + ' (' + str(len(df) - df['polityscore'].count()) + ' missing)') print('\n') # Get the rows not missing a value subset = df[np.isfinite(df['polityscore'])] subset = subset[np.isfinite(subset['incomeperperson'])] print('Number of observations: '+ str(len(subset)) +' (rows)') print('\n') # This function converts the polity score to a binary category flag def is_full_democracy(score): if score == 10: return(1) else: return(0) # Now we can use the function to create the new variable subset['is_full_democracy'] = subset['polityscore'].apply(is_full_democracy) # Create frequency table full_democracy_counts = subset.groupby('is_full_democracy').size() print(full_democracy_counts) print('\n') # Create simple regression model model = smf.ols('incomeperperson ~ is_full_democracy', data=subset).fit() print(model.summary())
This code is also available in my GitHub repository for this class.
Regression Model Results
The results of the linear regression model indicated that the presence of a full-democracy (Beta=15,990, p=0.000) was significantly and positively associated with the economic well-being. The adjusted R2 of for this model is 0.439.