A brief introduction to Dynamic Factor Models in Stan
David Stephan and Jim Savage
31 August 2016
Policymakers are forced to rely on many different partial indicators (such as retail trade figures or vacancy data) to try and form a picture of the current state of the economy. The difficulty faced when analysing partial data is what importance to place on a certain series, especially large movements. Are they a sign of a coming downturn, or are they merely inherent noise in the measurement of the series?
The process of combining all the series into a picture of the business cycle is quite difficult and often messy. Given that the business cycle is not entirely captured by any one variable (whether it be GDP, employment or share prices), but rather reflects movements in many variables, it would be beneficial to have an indicator that somehow provided a consistent and coherent method for combining the useful information provided by the partial indicators.
One technique for achieving this goal is to what are known as factor models. We can think of macroeconomic time series as incorporating a common component and an idiosyncratic component. Think of any macroeconomic indicator, such as employment growth:
where ft is the common factor with factor loading (or weight) γ, and ϵt is a an idiosyncratic error term, about which we need to give a distributional assumption. This same specification will hold for all the series we are interested in (and/or have data for):
There are a number of ways in which we could extract the common factor. In many studies, principal components is used. In our setting, especially if we wish to use a mixture of data frequencies (monthly, quarterly, etc.) it may be a simpler to instead use a state-space representation of our model. We retain the above equation as the signal or measurement equation of our system and allow a simple assumption for the evolution of the underlying state variable (the common factor):
where the common factor evolves as a simple autoregressive model with coefficient θ, and ω is a an idiosyncratic error term, which we assume is distributed normally with standard deviation σ. It is easy to extend the structure of the model to have more than one common factor and for the factor itself to evolve as a higher order or a random walk.
Factor models such as these have been shown to provide competitive forecasts of major macroeconomic aggregates when compared with standard time-series models. There are several versions of such models in use at the NY FED and the ECB.
Putting together some Australian data
In the example we give below, we use standardized annual growth in Australian employment and GDP as our signals of economic activity. These can be easily pulled from Quandl from within R:
library(Quandl); library(dplyr); library(rstan); library(ggplot2); library(reshape2)options(mc.cores=parallel::detectCores())#real gdp millions chained dollars - seasonally adjGDP<-Quandl("AUSBS/5206002_EXPENDITURE_VOLUME_MEASURES_A2304402X")%>%mutate(Date=as.Date(Date))%>%arrange(Date)%>%mutate(GDP_growth=log(Value/lag(Value, 4)))%>%dplyr::select(Date, GDP_growth)#employed persons - monthly seasonally adjEMP<-Quandl("AUSBS/6202001_A84423043C")%>%mutate(Date=as.Date(Date))%>%arrange(Date)%>%mutate(Emp_growth=log(Value/lag(Value, 12)))%>%dplyr::select(Date, Emp_growth)# Join series together, and set missing values to -999 (Stan doesn't like NAs)full_data<-left_join(EMP, GDP, by="Date")%>%mutate(GDP_growth=ifelse(is.na(GDP_growth), -999, GDP_growth))%>%filter(!is.na(Emp_growth))
Now we have our dataset, let’s write the model out in Stan. As you can see, this is very similar to writing out the underlying probability model on paper.