Quandl and Pandas

Sat 27 April 2013

The folks at Quandl have built a nice little python interface for accessing their data and getting it into Pandas. My goal was to grab some data for all 50 states for a given date and get it into a DataFrame with the states as the index and different types of data as the columns. Right now the data is split up on a state-by-state basis (rather than all 50 states as a single table of data), so I wrote a small snippet to grab each state for a given date, then concat them all into a complete Pandas DataFrame:

import pandas as pd
import Quandl
import vincent

abbvs = ['AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID',
         'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN',
         'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND',
         'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT',
         'VA', 'WA', 'WV', 'WI', 'WY']

#Set up dict with Dataset names, Quandl Codes, and empty lists for our data
data = {'Unemployment': {'Code': 'UR', 'data': []},
        'House Price Index': {'Code': 'STHPI', 'data': []}}
date = '2012-10-01'

#Will populate the list with 1-line DataFrames, then concat the pieces in the end
df_list = []
for key, value in data.iteritems():
    for state in abbvs:
        #Grab our data with the Quandl API
        quandl_code = 'FRED/{0}{1}'.format(state, value['Code'])
        df = Quandl.get(quandl_code,
                        authtoken='*******',
                        startdate=date,
                        enddate=date)

        #The data defaults to date- since I'm only grabbing one month, I would
        #rather have the data category as the (after I transpose) column name
        df = df.rename(columns={'Value': state},
                       index={pd.to_datetime(date): key})
        value['data'].append(df.T)

    #Take the data for each state, stack them into one dataframe column
    stacked = pd.concat(data[key]['data'])
    data[key]['data'] = stacked
    df_list.append(stacked)

#Turn two single-column 'Unemployment' and 'House Price Index' df's into one
all_data = pd.concat(df_list, axis=1)

>>>all_data
        House Price Index  Unemployment
    AK             289.32           6.8
    AZ             258.33           8.1
    AR             245.78           7.2
    CA             405.78          10.1
    CO             345.98           7.7
    CT             392.27           8.4
    DE             407.31           7.1
    FL             280.15           8.2
    GA             258.18           8.8
    HI             456.91           5.4
    ID             266.10           6.6
    IL             303.92           8.8
    IN             244.66           8.4
    IA             251.59           5.1
    KS             234.53           5.6
    KY                NaN           8.1
    LA             245.82           5.9
    ME             452.29           7.2
    MD             407.38           6.8
    MA                NaN           6.7
    MI                NaN           9.1
    MN                NaN           5.6
    MS             239.13           9.1
    MO                NaN           6.7
    MT                NaN           5.8
    NE                NaN           3.9
    NV             191.11          10.3
    NH             392.91           5.7
    NJ             458.39           9.6
    NM                NaN           6.8
    NY                NaN           8.4
    NC                NaN           9.4
    ND                NaN           3.2
    OH             239.16           6.9
    OK                NaN           5.2
    OR             357.07           8.5
    PA             367.16           8.0
    RI                NaN          10.1
    SC             305.20           8.8
    SD             301.00           4.4
    TN                NaN           7.8
    TX                NaN           6.4
    UT             316.56           5.5
    VT             442.03           5.0
    VA             397.53           5.8
    WA             392.27           7.8
    WV             215.77           7.5
    WI             296.69           6.8
    WY             277.86           5.1

Lots of NaN in that housing data. Let's plot the unemployment data with Vincent really quickly:

vis = vincent.Bar(width=1200, height=400)
vis.tabular_data(all_data, columns=['Unemployment'])
vis.scales.append({'name': 'color',
                   'domain': {'data': 'table', 'field': 'data.y'},
                   'range': ['#0b0d11', '#6a7a9d']})
vis + ({'scale': 'color', 'field': 'data.y'}, 'marks', 0, 'properties',
       'update', 'fill')
vis.axis_label(x_label='US States', y_label='Unemployment %')
vis.to_json(path)

Unemployment

Sorry about the small axis labels- I think I need to find a wider blog template (or stop trying to make ordinal plots with 50 items...)

This entry was tagged as python pandas vincent data quandl