The Relationship Between a Country’s Urbanization Rate and Economic Well-Being

As part of Week 3 of the Coursera Data Analysis and Tools course I examined the relationship between a country’s urbanization rate and the economic well-being of the citizens.  I have saved my python script and Jupyter notebook to GitHub repository for the course.

Source Code

For this analysis the following Python code was used:

# Import libraries needed
import pandas as pd
import scipy
import seaborn as sns
import matplotlib.pyplot as plt

# Read in the GapMinder Data
df = pd.read_csv('gapminder.csv', low_memory=False)

# Change the data type for variables of interest
df['urbanrate'] = pd.to_numeric(df['urbanrate'], errors='coerce')
df['incomeperperson'] = pd.to_numeric(df['incomeperperson'], errors='coerce')

# Get the subset of complete data cases
subset = df[['urbanrate','incomeperperson']].dropna()

# Pearson's Correlation Coefficient
print ('Association Between Urbanization Rate and Economic Well-Being')
r = scipy.stats.pearsonr(subset['urbanrate'], subset['incomeperperson'])
print (r)
r_squared = r[0] * r[0]
print('R Squared = '+str(r_squared))

# Visualize the data
plt.figure(figsize=(14, 7))
sns.regplot(x="urbanrate", y="incomeperperson", data=subset)
plt.ylabel('Economic Well-Being (GDP Per Person)')
plt.xlabel('Urbanization Rate')


The above Python code resulted in a small yet positive Pearson’s correlation coefficient of 0.49. This relationship is statistically significant (p-value of 0.000000000000082). The R squared is 0.240192366296, so roughly a quarter of the variation is explained by these two variables.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s