# That model appeared to estimate very cleanly. But if we print the parameters estimates, we see that all the Rhats are large an n_effs are low. That’s bad! So we might check out the traceplots to see what’s going on. Oh dear! We have a case of the dreaded label switching. With these mixture models, the parameters aren’t identified up to an index. That is, we could change all the indices of the parameters, the likelihood would be unchanged. Think about it this way: we said that state 1 (μ=3, σ=2) had probability .5. But we could equally call it “state 2” and it wouldn’t affect the model at all so long as we slot the state 2 parameters into index 1. This “label” switching happens when the labels all change while the model is running. So what do we do? There are a few tricks. One is to declare one of the parameters as an ordered vector, so long as there is theoretical support for doing so. We can do this by replacing the line vector[n_groups] mu; with ordered[n_groups] mu; This estimates far more cleanly, as we can see from the parameters that now cover the generative process, and a much nicer looking traceplot. ## Inference for Stan model: finite_mixture_linear_regression_2. ## 4 chains, each with iter=600; warmup=300; thin=1; ## post-warmup draws per chain=300, total post-warmup draws=1200. ## ## mean se_mean sd 2.5% 25% 50% 75% ## mu[1] 2.51 0.09 0.59 0.66 2.34 2.63 2.87 ## mu[2] 4.87 0.13 1.42 2.78 3.53 4.87 6.04 ## mu[3] 7.92 0.09 1.23 6.33 7.06 7.63 8.42 ## sigma[1] 1.60 0.06 0.39 0.64 1.42 1.60 1.79 ## sigma[2] 2.56 0.08 0.92 0.94 1.85 2.56 3.25 ## sigma[3] 3.36 0.04 0.55 2.09 3.12 3.45 3.70 ## Theta[1] 0.33 0.02 0.15 0.04 0.22 0.35 0.44 ## Theta[2] 0.30 0.01 0.15 0.05 0.18 0.29 0.40 ## Theta[3] 0.37 0.02 0.16 0.05 0.24 0.38 0.50 ## lp__ -1060.70 0.15 2.19 -1066.08 -1061.92 -1060.33 -1059.09 ## 97.5% n_eff Rhat ## mu[1] 3.27 46 1.07 ## mu[2] 7.53 115 1.03 ## mu[3] 11.25 181 1.03 ## sigma[1] 2.37 50 1.08 ## sigma[2] 4.23 140 1.02 ## sigma[3] 4.28 185 1.04 ## Theta[1] 0.59 86 1.05 ## Theta[2] 0.60 213 1.01 ## Theta[3] 0.65 111 1.03 ## lp__ -1057.62 216 1.00 ## ## Samples were drawn using NUTS(diag_e) at Mon Oct 17 22:56:15 2016. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1). In the next post, I’ll build on these concepts and discuss Markov Switching.

#### 1 comment:

1. Great post. I just wanted to comment and explicitly link the generative model with the likelihood you describe for future readers. The generative model, i.e. the model that we believe gave rise to our date, involves a class indicator z_i. However, you'll notice that this indicator is absent from the likelihood. The likelihood is the joint distribution of the data we actually observe and presumably we do not observe class membership indicators. In fact, many times predicting these memberships is of interest. So the likelihood can be obtained from the generative model by marginalizing over the indicators z. Then taking a product over your iid observations.