I have just released an update to my blsAPI R package. Some users noticed the return.data.frame parameter was returning some strange results. I resolved the bugs and made the output cleaner. I appreciate hearing back from users on ways to improve the package. For more information on the blsAPI package, please see my GitHub repository.
I have previously posted that I developed a R package to facilitate pulling data from the BLS API. David Hiles asked that I incorporate pulling in QCEW data that is not available through the standard API. It was a great idea and so I did it. It is now posted to CRAN or the GitHub repository.
So if you install/update this R package you will have a blsQCEW() function. You pass in what type of data you are looking for. Valid options are: Area, Industry and Size. Other parameters are needed but depend on what type of request you are making.
Area Data Request
Area request require a year, quarter, and area parameters. The area codes are defined by the BLS and available here: http://www.bls.gov/cew/doc/titles/area/area_titles.htm. Here’s a code example for an area request:
# Request QCEW data for the first quarter of 2013 for the state of Michigan MichiganData <- blsQCEW('Area', year='2013', quarter='1', area='26000')
Industry Data Request
Industry requests require a year, quarter, and industry parameters. Some industry (NAICS) codes contain hyphens but the open data access uses underscores instead of hyphens. So 31-33 becomes 31_33. For all industry codes and titles see: http://www.bls.gov/cew/doc/titles/industry/industry_titles.htm. Here’s a code example for pulling making a construction industry request:
# Request Construction data for the first quarter of 2013 Construction <- blsQCEW('Industry', year='2013', quarter='1', industry='1012')
Size Data Request
Data by size is only available for the first quarter of each year. To make this type of request, you only need to provide the size and the year parameters. The size codes are available here: http://www.bls.gov/cew/doc/titles/size/size_titles.htm. Here’s a code example:
# Request data for the first quarter of 2013 for establishments with 100 to 249 employees SizeData <- blsQCEW('Size', year='2013', size='6')
I also want to mention that the blsAPI() function has been changed to return data either as a JSON string or as a data frame. I hope others will find these improvements helpful.
Since I have finally got the blsAPI posted to CRAN I want to capture the lessons learned from this experience. I can honestly say some of this is still black box to me but with persistence I can promise things will work out.
Let’s begin with some back story. My package came from a script that I wrote. I had built and installed the package locally and could use it without any problem. I had posted it on GitHub and when I got a request to publish it on CRAN I thought why not. I created a tar.gz file and submitted it but it was rejected. Here’s what I had to do to get things resolved:
Roxygen2 is Your Friend
Early on I hacked the man page even though there was “% Generated by roxygen2 (4.1.0): do not edit by hand” at the top of it. After reading Hadley Wickham’s guide I learned some neat tricks. Don’t fight roxygen2. Use it.
No Hidden Files
I work primarily in a Windows environment and so my first submission included hidden git files and directories. Delete these out before submitting them. I had to copy the directory over to a Linux machine and clean it up, then build the tar.gz file for submission.
Check it Using RStudio Over and Over
RStudio is a wonderful environment! I learned how to test my package thanks to this article. This introduced me to the concept of a NOTE. As I understand it, a NOTE is like a very mild WARNING. I worked hard to resolve all NOTES because according to Karl Broman
… even a “Note” will likely disqualify you.
No Library or Require Commands – Use Import
The script uses the rjson and RCurl packages. I loaded them in my function using a *cringe* require() call. I quickly changed it to a library() call (read Yihui Xie’s excellent explanation why you should do it too!) but when I checked it in RStudio the feedback was to remove the lines. Then when I commented out the lines the feedback was the functions were not found! After a lot of frustration I found out the secret. In my R file you will see the following:
#' @import rjson RCurl
This is an roxygen2 line that tells the program to load these two libraries. I also added the following to the mysterious DESCRIPTION file:
Imports: rjson, RCurl
These two things ensure the libraries are loaded properly.
Don’t Forget to Export
While I’m thinking of it you have to remember to export your functions. I did it with the following roxygen2 line in the R file:
#' @export blsAPI
This allows people to use your functions. I was getting frustrated because I could check and build my package and install it but couldn’t use it. R kept saying it couldn’t find the function. I added this line and the problem was solved and I felt like an idiot.
Get the Title Case Right!
This was particularly frustrating for me. Burried in Writting R Extensions in the middle of a sentance there is the following clause:
[The title] should use title case (that is, use capitals for the principal words)
There seems to be little agreement out there on what constitutes principal words. Some posts I read explained that of should be “Of” in order to meet the title case requirement. Yes it’s bad English but that’s what you have to do.
Never Give Up, Never Surrender
I had to submit my package 8 times before it was accepted to CRAN. Some of the changes were substantive and some were superficial. Karl Broman gave some sage advice when he said:
Finally, put on your armor. One of the people that handles CRAN submissions can be unnecessarily offensive and pedantic. Try to put his little barbs out of your mind and focus on his actual advice on how to revise your package to make it suitable for CRAN.
With my economic training I viewed some of the request as a “barrier to entry” by the CRAN gatekeepers. I could have become easily offended but I kept working and submitting and was, in the end, successful. And I know you can too.
To use the function you need to specify the series id(s) and optionally the start and end years. The following are some example of how you could use this package (these examples are taken from http://www.bls.gov/developers/api_signature.htm):
response <- blsAPI('LAUCN040010000000005') json <- fromJSON(response)
payload <- list('seriesid'=c('LAUCN040010000000005','LAUCN040010000000006')) response <- blsAPI(payload) json <- fromJSON(response)
One or More Series, Specifying Years
payload <- list('seriesid'=c('LAUCN040010000000005','LAUCN040010000000006'), 'startyear'='2010', 'endyear'='2012') response <- blsAPI(payload) json <- fromJSON(response)