Jim Savage
7/16/2017
It’s often convenient to make use of reparameterizations in our models. Often we want to be able to express parameters in a way that makes coming up with a prior more intuitive. Other times we want to use reparameterizations to improve sampling performance during estimation. In any case, the below are some very simple reparameterizations that we often use in applied modeling.
Reparametrizaing a univariate normal random variable
Sometimes we have a normally distributed unknown with mean (location) and standard deviation (scale) . We write this as
If we have information about the expected mean and standard deviation for data at the level of the observation, , we could happily incorporate this information like so:
For instance, a normal linear model has , but we could use a variety of functional forms for both.
In such a case, we can always reparameterize as
// ... Your data declaration here
parameters {
vector[N] z;
// parameters of f() and g()
}
transformed parameters {
vector[N] theta;
theta = f(X) + g(X)*z; // f() and g() are vector valued functions that probably have parameters
}
model {
// priors
z ~ normal(0, 1);
// likelihood
// ..
}
Reparameterizing a covariance matrix
I always found covariance matrices tricky to think about until I realized how easily they can be reparameterized. We all know how to think about the standard deviation of a random variable. If the growth rate of GDP has a standard deviation of 1%, then that’s very intuitive. How about a vector of random variables? If the vector is (GDP, Unemployment, Rainfall) then each of those random variables has its own standard deviation. Let’s call it . Easy!
Now those random variables might move together. We typically measure (linear) mutual information of a vector of random variables using a correlation matrix, . The diagonal of a correlation matrix is 1 (all variables are perfectly correlated with themselves). and the off-diagonals are between -1 and 1, reflecting the correlation coefficient between each random variable. It it symmetric. For instance, the element is the correlation between GDP and rainfall.
We have everything we need now. The covariance matrix is simply:
We’d implement this in Stan by declaring and as parameters, then as a transformed parameter. We then provide priors for and , but use in the likelihood. We often use the LKJ distribution as a prior for correlation matrices.
parameters {
vector<lower = 0>[3] tau;
corr_matrix[3] Omega;
}
transformed parameters {
matrix[3, 3] Sigma;
Sigma = diag_matrix(tau)*Omega*diag_matrix(tau);
}
model {
// priors
tau ~ student_t(3, 0, 2);
Omega ~ lkj_corr(4);
// likelihood
// expression involving Sigma
}
Reparameterizing multivariate normals
You’ll notice in the reparameterization for some that is the square root of the variance of . Multivariate normal distributions are typically parameterized in terms of their variance covariance matrix, which is the analog to the variance of a univariate normal. But if we want to apply our intuition from the above reparameterization, we need the “square root” of this covariance matrix.
There are many such “square roots” of positive definite matrices; one is the Cholesky factorization
then we can also say that
Another convenient take on this is to use the fact that if
where is the Cholesky factor of the correlation matrix, then we can use the parameterization
This parameterization requires less fiddling than the one above. Stan also gives us an LKJ prior distribution for the Cholesky factors of correlation matrix
In Stan, we implement this by declaring , and as parameters, and as a transformed parameter.
parameters {
vector[K] mu;
vector<lower = 0>[K] tau;
vector[N] z[K];
cholesky_factor_corr[K] L_Omega;
}
transformed parameters {
matrix[K, K] L;
vector[N] Theta[K];
L = diag_pre_multiply(tau, L_Omega);
for(n in 1:N) {
Theta[n] = mu + L * z[n];
}
}
model {
mu ~ normal(0, 1);
tau ~ student_t(3, 0, 2);
for(n in 1:N) {
z[n] ~ normal(0, 1);
}
L_Omega ~ lkj_corr_cholesky(4);
// likelihood below, depending on Theta
}
There are of course many other reparameterizations that we use in buiding models, but I tend to use these three daily.
Not: Theta[n] = mu + diag_matrix(tau)*L_Omega * z[n];
ReplyDeleteRather: Theta[n] = mu + diag_pre_multiply(tau, L_Omega) * z[n];
But actually diag_pre_multiply(tau, L_Omega) should be executed just once before the loop.
Thanks Ben -- updated.
ReplyDeleteIsnt lkj_corr(4) too strong? i.e. you might be forcing it to be a diagonal covariance unwittingly. Maybe add a uniform prior over that parameter between 1 and 4?
ReplyDeleteI don't think it's that strong. One of the things I do to get an idea of how the degrees of freedom parameter affects the distribution is to export the lkn_corr_rng() function to R (using expose_stan_functions()) and then draw some random numbers. lkj(4) isn't very tight. But yes, nothing to stop you from using a looser hyperprior.
Delete