Wednesday, February 28, 2018

Regime-switching models in Stan

Many time-series exhibit “regimes”—periods for which a set of rules seem to apply. When we jump to a different regime, a different set of rules apply. Yet we never really know which regime we’re in—they aren’t directly observable. This post illustrates the intuition behind building models that have these latent regimes in Stan. They are close in concept to finite mixture models as described here, with the main difference being that we model the changes in state probabilities as being a discrete (hidden) Markov process, rather than a continuous autoregressive process. The following explanation uses James Hamilton’s fantastic notes.

The generative process

As in the linked post above, we’ll use 100 times the daily difference in logged prices of a stock as the example time-series to model. Call these percentage returns . Just as before, we have two states: a random walk state, and a momentum state, (in which returns persist). I’ll assume normal distributions for these daily returns—you shouldn’t do this in real life. Just want to keep things simple. The generative models under the two states are:
State 1:  (prices take a random walk)
State 2:  (some momentum in returns)
We’ll call the true state in period  . Of course, we don’t actually know what state we’re in on a given day, but we can infer probabilistically if we have a model for it. The particular way we’ll model this discrete state is as a Markov chain. That is, 

  
—all the information relevant to making a forecast for what the state is in the next period is contained in what the state was the previous period.
It might help to illustrate the Markov matrix explicitly:

Strategy for fitting the model

The problem with fitting this model is that we don’t know what state we were in in the previous period for sure, which necessitates an iterative algorithm.
First, let’s define the likelihood contribution of an observation  under the two states

In Stan notation, this is just
// inside a for loop over t
eta1[t] = exp(normal_lpdf(y[t] | alpha_1, sigma_1));

eta2[t] = exp(normal_lpdf(y[t] | alpha_2 + rho * y[t-1], sigma_2));
If we knew the state in the previous period for sure, we could evaluate the log likelihood contribution of  as

That is, the likelihood contribution is simply the weighted average of likelihood contributions under the two proposed models, where the weights are the probability of being in each state.
Of course, we don’t know the probability of being in each state. Let  be the probability of being in state  in period , so  is the probability of being in that state the previous period. This gives us the likelihood contribution

Again, this is just a weighted average of likelihood contributions, weighted now by the probability of being in each state, which itself is uncertain and therefore we have to average across the probability of being in each of those states last period.
But where does s come from? We can back out the probability of being in each state in t by noticing that the probability of being in each state is

Once we have the likelihood contributions, we can calculate the log likelihood

Priors and identification

The big problem with modeling any mixture-type model is that if the two generative models are fairly similar, the relative likelihoods will be fairly similar and you can wind up with each MCMC chain mis-labeling the different models. Mike Betancourt has a wonderful primer on this here. In order to identify the model, we need prior information that makes each of the potential data generating processes distinguishable from each other. In the case of the model above, the case where  makes the data generating processes the same, so we need a strong prior on this parameter in order to identify the model. You’ll know that your regime-switching model is poorly identified when your Rhats are larger than 1 but each chain is mixing well when considered individually.
We want the autoregressive model to be distinguishable from the white-noise model. So let’s go with a fairly tight prior on . If we relax this prior too much, we expect that  will get dragged towards 0 and the the model will get less identifiable.
 is the daily expected percentage point return, which should be unbounded. With 252 trading days a year, a  would imply one-standard-deviation annual returns of 25%, which seems reasonable.
The innovation scales must be positive—we’ll use half Cauchy .
 must be in (0, 1), and we expect considerable stickiness of states so we’ll use a .
Finally the initial values of  we’ll use .

R and Stan code for implementation

First, let’s get the same data we used for the last example (note that the Quandl code has changed now that they want to make money).

library(tidyverse); library(rstan); library(Quandl)
# Now with real data! 
googl <- Quandl("WIKI/GOOGL", collapse = "weekly")

googl <- googl %>%
  mutate(Date = as.Date(Date)) %>%
  arrange(Date) %>% 
  mutate(l_ac = log(`Adj. Close`),
         dl_ac = c(NA, diff(l_ac))) %>% 
  filter(Date > "2010-01-01")

plot.ts(googl$dl_ac, main = "Weekly changes in Google's adjusted close price")