I have recently been exploring data on the public libraries of New York State for a side project (more on that in a latter post hopefully). I have also stated a Data Visualization course on Coursera and have decided to feature some visualization of this data set.
About the Data
The data used in this analysis comes from the Annual Report for Public and Association Libraries produced for New York State Education Department (NYSED). You can access the data at http://collectconnect.baker-taylor.com/ using “new york” as the username and “pals” as the password. Load the saved list named “All Libraries as of 15 March 2016” and select the “Total Circulation” data element.
For this visualization I decided to use all data from 2000 to 2014 (latest data available). I aggregated the library level circulation data to generate the aggregate circulation for New York State Public Libraries. I used colorblind safe colors from the Color Brewer palette. I adjusted the scale on the Y-axis to be in millions. I used R to generate the following visualization:
What It Tells Us
Book circulation generally increased until 2010 where one observes a reversal of the decade long trend. There is an exceptionally precipitous drop from 2013 to 2014.
This begs the question why is this changing? Is it because of a change in the population? Is it due to a change in the number of libraries reporting (might explain the 2013-2014 drop)? Is it due to a rise in digital media sources as a substitute for books? Is it due to a lack of public support/investment in libraries? I plan at looking at that last question in a future post.
library(dplyr) library(tidyr) library(ggplot2) library(ggthemes) book_circulation <- read.csv('https://goo.gl/fyybwi', na.strings = 'N/A', stringsAsFactors = FALSE) %>% gather(., Year, measurement, X1991:X2014) %>% mutate(Year = as.numeric(substr(Year,2,5))) %>% mutate(measurement = as.numeric(gsub(',', '', measurement))) %>% filter(Year > 1999)%>% filter(ifelse(is.na(measurement),0,1)==1) %>% group_by(Year) %>% summarise(Circulation = sum(measurement)) %>% mutate(Circulation = Circulation/1000000) ggplot(book_circulation, aes(Year, Circulation)) + geom_bar(stat='identity', fill="#9ecae1", colour="#3182bd") + ylab('Book Circulation (in millions)') + ggtitle('Book Circulation in NYS Public Libraries, 2000-2014') + theme_hc()