The European central bank provides a lot of public data in their Statistical Data Warehouse. In this notebook we will show how to access data and metadata from Python.

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import quandl
from pandasdmx import Request
In [2]:
plt.rcParams['figure.figsize'] = 14, 6
sns.set()

Get resource metadata

ECB provides multiple resources in the Statistical Data Warehouse. For any resource acronym we can find the associated full name.

In [3]:
ecb = Request('ECB')
In [4]:
def get_resource_name(this_resource_id):
    
    exr_flow = ecb.dataflow()
    resource_types = exr_flow.write().dataflow
    this_resource_name = exr_flow.write().dataflow.loc[this_resource_id, 'name']
    
    return this_resource_name
In [5]:
this_resource_id = 'IRS'
get_resource_name(this_resource_id)
Out[5]:
'Interest rate statistics'

In order to translate these acronyms into names we can query the ECB warehouse to get a list of all related metadata dimensions:

In [6]:
def get_resource_metadata_decodes(this_resource_id):
    
    this_resource_decodes = ecb.dataflow(this_resource_id).write().codelist
    
    return this_resource_decodes
In [7]:
decodes_df = get_resource_metadata_decodes(this_resource_id)
decodes_df.head(2)
Out[7]:
dim_or_attr name
FREQ FREQ D Frequency
M D Monthly
In [8]:
decodes_df.shape
Out[8]:
(3662, 2)

Download data

Download data with metadata as pandas:

In [9]:
def get_ecb_long_term_yields(resource_id, ts_key, params=None):
    """
    Download data from ECB
    """
    
    ecb = Request('ECB')
    
    if params is None:
        data_response = ecb.data(resource_id = resource_id,
                                 key={'REF_AREA': ts_key,
                                     'IR_TYPE': 'L'})
    else:
        data_response = ecb.data(resource_id = resource_id,
                                 key={'REF_AREA': ts_key,
                                     'IR_TYPE': 'L'},
                                     params=params
                                )
        
    raw_data = data_response.data
    series_tuple = (s for s in raw_data.series)
    data = data_response.write(series_tuple)
    
    return data
In [10]:
# define list of countries
list_of_areas_no_euro = ['BG', 'CZ', 'DK', 'GB', 'HR', 'HU', 'PL', 'RO', 'SE']

# download data
data_df = get_ecb_long_term_yields('IRS', list_of_areas_no_euro, params={'startPeriod': '2016'})
data_df.head(3)
Out[10]:
FREQ M
REF_AREA BG CZ DK GB HR HU PL RO SE
IR_TYPE L L L L L L L L L
TR_TYPE L40 L40 L40 L40 L40 L40 L40 L40 L40
MATURITY_CAT CI CI CI CI CI CI CI CI CI
BS_COUNT_SECTOR 0000 0000 0000 0000 0000 0000 0000 0000 0000
CURRENCY_TRANS BGN CZK DKK GBP HRK HUF PLN RON SEK
IR_BUS_COV N N N N N N N N N
IR_FV_TYPE Z Z Z Z Z Z Z Z Z
TIME_PERIOD
2016-01 2.44 0.62 0.786 1.7253 3.84 3.36 3.00 3.60 1.02
2016-02 2.67 0.46 0.545 1.4362 3.68 3.36 3.03 3.33 0.77
2016-03 2.66 0.35 0.508 1.4641 3.66 3.13 2.89 3.34 0.82

Interpret metadata

For all data there exist multiple dimensions of metadata, called concepts. We can extract the acronyms of all dimensions from the data columns:

In [11]:
def get_concept_list_from_data(data_df):
    
    return list(data_df.columns.names)
In [12]:
list_of_data_concepts = get_concept_list_from_data(data_df)
list_of_data_concepts
Out[12]:
['FREQ',
 'REF_AREA',
 'IR_TYPE',
 'TR_TYPE',
 'MATURITY_CAT',
 'BS_COUNT_SECTOR',
 'CURRENCY_TRANS',
 'IR_BUS_COV',
 'IR_FV_TYPE']

So we can come up with a function that maps the dimension acronyms to meaningful names:

In [13]:
def get_concept_decodes(decodes_df, concepts):
    
    concepts_labels = decodes_df.loc[(concepts, concepts), ['name']]
    concepts_labels.index = concepts_labels.index.get_level_values(0)
    
    return concepts_labels
In [14]:
concept_decode_mapping = get_concept_decodes(decodes_df, list_of_data_concepts)
concept_decode_mapping
Out[14]:
name
FREQ Frequency
REF_AREA Reference area
IR_TYPE Interest rate type
TR_TYPE Transaction type
MATURITY_CAT Maturity category
BS_COUNT_SECTOR BS counterpart sector
CURRENCY_TRANS Currency of transaction
IR_BUS_COV IR business coverage
IR_FV_TYPE Interest rate type (fix/var)

The same way we can also translate all occurring dimension values into meaningful labels:

In [15]:
all_maps = None

for this_concept in list_of_data_concepts:
    
    these_occurring_vals = data_df.columns.get_level_values(this_concept).unique().values
    
    # get occurring values
    this_map = decodes_df.loc[this_concept, :].loc[these_occurring_vals, 'name'].to_frame().reset_index()
    this_map.rename({'index': 'code'}, axis=1, inplace=True)
    this_map['concept'] = this_concept
    
    if all_maps is None:
        all_maps = this_map
    else:
        all_maps = pd.concat([all_maps, this_map], axis=0)


df_merge = concept_decode_mapping.reset_index()
df_merge.columns = ['concept', 'concept_name']

all_maps.merge(df_merge).loc[:, ['concept', 'concept_name', 'code', 'name']]
Out[15]:
concept concept_name code name
0 FREQ Frequency M Monthly
1 REF_AREA Reference area BG Bulgaria
2 REF_AREA Reference area CZ Czech Republic
3 REF_AREA Reference area DK Denmark
4 REF_AREA Reference area GB United Kingdom
5 REF_AREA Reference area HR Croatia
6 REF_AREA Reference area HU Hungary
7 REF_AREA Reference area PL Poland
8 REF_AREA Reference area RO Romania
9 REF_AREA Reference area SE Sweden
10 IR_TYPE Interest rate type L Long-term interest rate for convergence purposes
11 TR_TYPE Transaction type L40 Debt security issued
12 MATURITY_CAT Maturity category CI 10 years
13 BS_COUNT_SECTOR BS counterpart sector 0000 Unspecified counterpart sector
14 CURRENCY_TRANS Currency of transaction BGN Bulgarian lev
15 CURRENCY_TRANS Currency of transaction CZK Czech koruna
16 CURRENCY_TRANS Currency of transaction DKK Danish krone
17 CURRENCY_TRANS Currency of transaction GBP UK pound sterling
18 CURRENCY_TRANS Currency of transaction HRK Croatian kuna
19 CURRENCY_TRANS Currency of transaction HUF Hungarian forint
20 CURRENCY_TRANS Currency of transaction PLN Polish zloty
21 CURRENCY_TRANS Currency of transaction RON Romanian leu
22 CURRENCY_TRANS Currency of transaction SEK Swedish krona
23 IR_BUS_COV IR business coverage N New business
24 IR_FV_TYPE Interest rate type (fix/var) Z Unspecified

Or, if we want to look up some particular values for some given dimension (where the values will be translated and the original order will be preserved):

In [16]:
def decode_concept_values(this_concept, these_values):
    
    if type(these_values) == 'str':
        map_ss = all_maps.query('concept == @this_concept & code == @these_values')
    else:
        map_ss = all_maps.query('concept == @this_concept & code in @these_values')
        
    return map_ss.set_index('code').loc[these_values, 'name']
In [17]:
this_concept = 'REF_AREA'
this_concept_values = data_df.columns.get_level_values(this_concept).values

decoded_vals = decode_concept_values('REF_AREA', this_concept_values)
decoded_vals
Out[17]:
code
BG          Bulgaria
CZ    Czech Republic
DK           Denmark
GB    United Kingdom
HR           Croatia
HU           Hungary
PL            Poland
RO           Romania
SE            Sweden
Name: name, dtype: object
In [18]:
this_concept = 'MATURITY_CAT'
this_concept_values = data_df.columns.get_level_values(this_concept).values

decode_concept_values(this_concept, this_concept_values)
Out[18]:
code
CI    10 years
CI    10 years
CI    10 years
CI    10 years
CI    10 years
CI    10 years
CI    10 years
CI    10 years
CI    10 years
Name: name, dtype: object

That way, we can easily relabel the data with regards to the most relevant dimension only:

In [19]:
data_labeled = data_df.copy()
data_labeled.columns = decoded_vals.values
data_labeled.head(3)
Out[19]:
Bulgaria Czech Republic Denmark United Kingdom Croatia Hungary Poland Romania Sweden
TIME_PERIOD
2016-01 2.44 0.62 0.786 1.7253 3.84 3.36 3.00 3.60 1.02
2016-02 2.67 0.46 0.545 1.4362 3.68 3.36 3.03 3.33 0.77
2016-03 2.66 0.35 0.508 1.4641 3.66 3.13 2.89 3.34 0.82
In [21]:
data_labeled.plot()
plt.show()