## Sunday, July 16, 2017

### A few simple reparameterizations

It’s often convenient to make use of reparameterizations in our models. Often we want to be able to express parameters in a way that makes coming up with a prior more intuitive. Other times we want to use reparameterizations to improve sampling performance during estimation. In any case, the below are some very simple reparameterizations that we often use in applied modeling.
Reparametrizaing a univariate normal random variable
Sometimes we have a normally distributed unknown  with mean (location)  and standard deviation (scale) . We write this as

If we have information about the expected mean and standard deviation for data at the level of the observation, , we could happily incorporate this information like so:

For instance, a normal linear model has , but we could use a variety of functional forms for both.
In such a case, we can always reparameterize  as
In Stan, we’d implement this by declaring  in the parameter block, and  in the transformed parameter block. We then provide distributional information about  in the modeling block but typically use  in the likelihood. For example:

// ... Your data declaration here
parameters {
vector[N] z;
// parameters of f() and g()
}
transformed parameters {
vector[N] theta;
theta = f(X) + g(X)*z; // f() and g() are vector valued functions that probably have parameters
}
model {
// priors
z ~ normal(0, 1);

// likelihood
// ..
}

Reparameterizing a covariance matrix
I always found covariance matrices tricky to think about until I realized how easily they can be reparameterized. We all know how to think about the standard deviation of a random variable. If the growth rate of GDP has a standard deviation of 1%, then that’s very intuitive. How about a vector of random variables? If the vector is (GDP, Unemployment, Rainfall) then each of those random variables has its own standard deviation. Let’s call it . Easy!
Now those random variables might move together. We typically measure (linear) mutual information of a vector of random variables using a correlation matrix. The diagonal of a correlation matrix is 1 (all variables are perfectly correlated with themselves). and the off-diagonals are between -1 and 1, reflecting the correlation coefficient between each random variable. It it symmetric. For instance, the element  is the correlation between GDP and rainfall.
We have everything we need now. The covariance matrix  is simply:
The cool thing about this parameterization is that often our likelihood calls for a covariance matrix (for instance, if we were jointly modeling the three random variables), but we find it easier to provide prior information about the (marginal) scale and correlation between the variables.

We’d implement this in Stan by declaring  and  as parameters, then  as a transformed parameter. We then provide priors for  and , but use  in the likelihood. We often use the LKJ distribution as a prior for correlation matrices.
parameters {
vector<lower = 0> tau;
corr_matrix Omega;
}
transformed parameters {
matrix[3, 3] Sigma;
Sigma = diag_matrix(tau)*Omega*diag_matrix(tau);
}
model {
// priors
tau ~ student_t(3, 0, 2);
Omega ~ lkj_corr(4);

// likelihood
// expression involving Sigma
}
Reparameterizing multivariate normals
You’ll notice in the reparameterization  for some  that  is the square root of the variance of . Multivariate normal distributions are typically parameterized in terms of their variance covariance matrix, which is the analog to the variance of a univariate normal. But if we want to apply our intuition from the above reparameterization, we need the “square root” of this covariance matrix.
There are many such “square roots” of positive definite matrices; one is the Cholesky factorization
where L is a lower triangular matrix with the same dimensions as . If we have such an , we can very easily apply the reparameterization at the top. For some vector

then we can also say that

Another convenient take on this is to use the fact that if

where  is the Cholesky factor of the correlation matrix, then we can use the parameterization

This parameterization requires less fiddling than the one above. Stan also gives us an LKJ prior distribution for the Cholesky factors of correlation matrix
In Stan, we implement this by declaring   and  as parameters, and  as a transformed parameter.
parameters {
vector[K] mu;
vector<lower = 0>[K] tau;
vector[N] z[K];
cholesky_factor_corr[K] L_Omega;
}
transformed parameters {
matrix[K, K] L;
vector[N] Theta[K];
L = diag_pre_multiply(tau, L_Omega);

for(n in 1:N) {
Theta[n] = mu + L * z[n];
}
}
model {
mu ~ normal(0, 1);
tau ~ student_t(3, 0, 2);
for(n in 1:N) {
z[n] ~ normal(0, 1);
}
L_Omega ~ lkj_corr_cholesky(4);

// likelihood below, depending on Theta
}
There are of course many other reparameterizations that we use in buiding models, but I tend to use these three daily.

1. Not: Theta[n] = mu + diag_matrix(tau)*L_Omega * z[n];
Rather: Theta[n] = mu + diag_pre_multiply(tau, L_Omega) * z[n];
But actually diag_pre_multiply(tau, L_Omega) should be executed just once before the loop.

2. Thanks Ben -- updated.

3. Isnt lkj_corr(4) too strong? i.e. you might be forcing it to be a diagonal covariance unwittingly. Maybe add a uniform prior over that parameter between 1 and 4?

1. I don't think it's that strong. One of the things I do to get an idea of how the degrees of freedom parameter affects the distribution is to export the lkn_corr_rng() function to R (using expose_stan_functions()) and then draw some random numbers. lkj(4) isn't very tight. But yes, nothing to stop you from using a looser hyperprior.