Saturday, October 22, 2016

Finite mixture model with time-varying probabilities - part 1

In my last post, I described a simple model in which each observation of our data could have one of two densities. We estimated the parameters of both densities, and the probability of the data coming from either. While finite mixture models as in the last post are a useful learning aid, we might want richer models for applied work. In particular, we might want the probability of our data having each density to vary across observations. This is the first of two posts dedicated to this topic. I gave a talk covering some of this a few days ago (best viewed in Safari).
For sake of an example, consider this: the daily returns series of a stock has two states. In the first, the stock is ‘priced to perfection’, and so the price is an I(1) random walk (daily returns are mean stationary). In the second, there is momentum—here, daily returns have AR(1) structure. Explicitly, for daily log returns :
State 1: 
State 2: 
When we observe a value of , we don’t know for sure whether it came from the first or second model–that is precisely what we want to infer. For this, we need a model for the probability that an observation came from each state . One such model could be:


Here,  is a function of the information available at the beginning of day t. If we had interesting information about sentiment, or news etc., it could go in here. For simplicity, let’s say .
Under this specification (and for a vector containing all parameters, θ), we can specify the likelihood contribution of an observation. It is simply the weighted average of likelihoods under each candidate data generating process, where the weights are the probabilities that the data comes from each density.

As discussed in the last post, we work in log likelihoods, not likelihoods. This means we should use the log_sum_exp() function in Stan. This means that we express the log likelihood contribution of a single point as:
log_sum_exp(log(inv_logit(mu[t])) + normal_lpdf(r[t] | alpha[1], sigma[1]),
            log((1 - inv_logit(mu[t]))) + normal_lpdf(r[t] | alpha[2] + rho[1], sigma[2]))
Stan has recently added another function which performs the same calculation, but makes writing it out a bit easier. For two log densities lp1,lp2 and a mixing probability theta, we have
log_mix(theta, lp1, lp2) = log_sum_exp(log(theta) + lp1,
                                       log(1-theta) + lp2)

Writing out the model

The Stan code for the model is:
// saved as time_varying_finite_mixtures.stan
data {
  int T;
  vector[T] r;
parameters {
  vector[T] mu;
  vector[2] rho;
  real beta;
  vector<lower = 0>[3] sigma;
  vector[3] alpha; 
model {
  // priors 
  mu[1] ~ normal(0, .1);
  sigma ~ cauchy(0, 0.5);
  rho ~ normal(1, .1);
  beta~ normal(.5, .25);
  alpha[1:2] ~ normal(0, 0.1);
  alpha[3] ~ normal(0, 1);

  // likelihood
  for(t in 2:T) {
    mu[t] ~ normal(alpha[3] + rho[1]*mu[t-1] + beta* r[t-1], sigma[3]);

    target += log_mix(inv_logit(mu[t]), 
                      normal_lpdf(r[t] | alpha[1], sigma[1]), 
                      normal_lpdf(r[t] | alpha[2] + rho[2] * r[t-1], sigma[2]));

Recapturing ‘known unknowns’

As should be clear by now, I believe strongly that we should simulate from the model and make sure that we can recapture “known unknowns” before taking the model to real data. Below we simulate some fake data.
# Set some fake parameters
alpha1 <- -0.01
alpha2 <- 0.015
rho1 <- 0.95
rho2 <- 0.8
beta <- 0.5

sigma1 <- 0.05
sigma2 <- 0.03
sigma3 <- 0.3
T <- 500
r <- rep(NA, T)
r[1] <- 0

mu <- rep(NA, T)
z <- rep(NA, T)
mu[1] <- 0
z[1] <- 1

# Simulate the data series
for(t in 2:T) {
  mu[t]  <- rho1 * mu[t-1] + beta*(r[t-1]) + rnorm(1, 0, sigma3)
  prob <- arm::invlogit(mu[t])
  z[t] <- sample(1:2, 1, prob = c(prob, 1-prob))
  if(z[t]==1) {
    # random walk state
    r[t] <- rnorm(1, alpha1, sigma1)
  } else {
    # momentum state
    r[t] <- rnorm(1, alpha2 + rho2*r[t-1], sigma2)
You should plot your data before doing anything. Let’s take a look.
# Plot the returns