The BLS API: Synchronous and Asynchronous Test Results

I am a big fan of using APIs to pull data.  I am the creator of the R package blsAPI found on CRAN and GitHub.  As I mentioned in my previous post, I have started playing around with Python’s asyncio library.  I decided to try it out on the BLS’s API.

I set up a little test where I would request unemployment data (i.e. the number unemployed, the unemployment rate and the number in the labor force) for all 50 states and D.C. both synchronously and asynchronously.  The test was a bit of a hello world as it really didn’t do much more than request the data.  I just wanted to get a sense of the productivity gains.  My first run through all 153 requests took a little more than 2 seconds asynchronously and roughly 20 seconds synchronously.  That is a 10 fold difference!  That’s huge!

But before getting too excited I decided to run a few more tests to see if these results are typical.  Not wanting to adversely affect the BLS’s servers, I limited the number of runs to 100 each.  The test code is found at the end of this post for those interested in trying it out for themselves.

Here’s a screenshot of the test’s output to the console:

As you can see the results are pretty close to what I initially observed. The synchronous code took on average roughly 21.8 seconds to complete.  While the asynchronous took on average about 2.5 seconds.  Here’s a visualization of the full data set:

import requests
import json
import time
import asyncio
import pandas as pd

test_runs = 100

# BLS API Parameters
BLS_API_key = 'TYPE YOUR OWN KEY HERE'
headers = {'Content-type': 'application/json'}

BLS_LAUS_state_area_codes = ['ST0100000000000', 'ST0200000000000', 'ST0400000000000', 'ST0500000000000', 'ST0600000000000', 'ST0800000000000', 'ST0900000000000', 'ST1000000000000', 'ST1100000000000', 'ST1200000000000', 'ST1300000000000', 'ST1500000000000', 'ST1600000000000', 'ST1700000000000', 'ST1800000000000', 'ST1900000000000', 'ST2000000000000', 'ST2100000000000', 'ST2200000000000', 'ST2300000000000', 'ST2400000000000', 'ST2500000000000', 'ST2600000000000', 'ST2700000000000', 'ST2800000000000', 'ST2900000000000', 'ST3000000000000', 'ST3100000000000', 'ST3200000000000', 'ST3300000000000', 'ST3400000000000', 'ST3500000000000', 'ST3600000000000', 'ST3700000000000', 'ST3800000000000', 'ST3900000000000', 'ST4000000000000', 'ST4100000000000', 'ST4200000000000', 'ST4400000000000', 'ST4500000000000', 'ST4600000000000', 'ST4700000000000', 'ST4800000000000', 'ST4900000000000', 'ST5000000000000', 'ST5100000000000', 'ST5300000000000', 'ST5400000000000', 'ST5500000000000', 'ST5600000000000']
measures = ['03', '04', '05']

# Translate the area codes to series ids
seriesids = list()
for BLS_LAUS_state_area_code in BLS_LAUS_state_area_codes:
    for measure in measures:
        seriesids.append('LAS' + BLS_LAUS_state_area_code + measure)

number_of_requests = len(seriesids)


def fetch(seriesid):
    data = json.dumps({'seriesid': [seriesid], 'registrationkey': BLS_API_key})
    response = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', data=data, headers=headers)
    return response.json()

def async_fetch(seriesid):
    data = json.dumps({'seriesid': [seriesid], 'registrationkey': BLS_API_key})
    headers = {'Content-type': 'application/json'}
    response = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', data=data, headers=headers)
    return (seriesid, response.json())

async def fetch_all(seriesids):
    loop = asyncio.get_event_loop()
    futures = [
        loop.run_in_executor(
            None, 
            async_fetch, 
            seriesid
        )
        for seriesid in seriesids
    ]
    for d in await asyncio.gather(*futures):
        temp[d[0]] = d[1]
        
temp = dict()
results = dict()

for i in range(test_runs):
    test_run = i + 1

    # Test A - Synchronous Requests
    start_time = time.time()
    
    for seriesid in seriesids:
        temp[seriesid] = fetch(seriesid)
    
    synchronous_time = time.time() - start_time
    
    print("Test " + str(test_run) + "-A " + str(number_of_requests) + " Sychronous Requests: %s seconds" % (sychronous_time))
    
    # Test B - Asynchronous Requests
    start_time = time.time()
    
    loop = asyncio.get_event_loop()
    loop.run_until_complete(fetch_all(seriesids))
    
    asynchronous_time = time.time() - start_time
    
    print("Test " + str(test_run) + "-B " + str(number_of_requests) + " Asychronous Requests: %s seconds" % (asychronous_time))
    results[test_run] = {'Synchronous': synchronous_time, 'Asynchronous': aynchronous_time}

loop.close()
df = pd.DataFrame.from_dict(results, orient='index')
writer = pd.ExcelWriter('BLS Test.xlsx')
df.to_excel(writer,'Results')
writer.save()
Advertisements

blsAPI Updated to Deliver QCEW Data

I have previously posted that I developed a R package to facilitate pulling data from the BLS API.  David Hiles asked that I incorporate pulling in QCEW data that is not available through the standard API.  It was a great idea and so I did it.  It is now posted to CRAN or the GitHub repository.

So if you install/update this R package you will have a blsQCEW() function.  You pass in what type of data you are looking for.  Valid options are: Area, Industry and Size.  Other parameters are needed but depend on what type of request you are making.

Area Data Request

Area request require a year, quarter, and area parameters.  The area codes are defined by the BLS and available here: http://www.bls.gov/cew/doc/titles/area/area_titles.htm.  Here’s a code example for an area request:

# Request QCEW data for the first quarter of 2013 for the state of Michigan
MichiganData <- blsQCEW('Area', year='2013', quarter='1', area='26000')

Industry Data Request

Industry requests require a year, quarter, and industry parameters.  Some industry (NAICS) codes contain hyphens but the open data access uses underscores instead of hyphens. So 31-33 becomes 31_33. For all industry codes and titles see: http://www.bls.gov/cew/doc/titles/industry/industry_titles.htm.  Here’s a code example for pulling making a construction industry request:

# Request Construction data for the first quarter of 2013
Construction <- blsQCEW('Industry', year='2013', quarter='1', industry='1012')

Size Data Request

Data by size is only available for the first quarter of each year. To make this type of request, you only need to provide the size and the year parameters. The size codes are available here: http://www.bls.gov/cew/doc/titles/size/size_titles.htm.  Here’s a code example:

# Request data for the first quarter of 2013 for establishments with 100 to 249 employees
SizeData <- blsQCEW('Size', year='2013', size='6')

I also want to mention that the blsAPI() function has been changed to return data either as a JSON string or as a data frame. I hope others will find these improvements helpful.

Status

BLS Featuring My R API Wrapper

I was in the process of cleaning up my package for submission to CRAN when I learned that the BLS has released v2 of their API service.  This version requires a key but allows for more requests plus annual average calculations which is cool.

I was shocked and gratified to see that under the Sample Code: R page they were featuring my work with this acknowledgement:

bls_api

My submission to CRAN has not accepted yet, but I’m still working on it.  In the mean time it is available through GitHub.