It’s often convenient to make use of reparameterizations in our models. Often we want to be able to express parameters in a way that makes coming up with a prior more intuitive. Other times we want to use reparameterizations to improve sampling performance during estimation. In any case, the below are some very simple reparameterizations that we often use in applied modeling.

Reparametrizaing a univariate normal random variable

Sometimes we have a normally distributed unknown θi with mean (location) μ and standard deviation (scale) σ. We write this as

θi∼Normal(μ,σ)

If we have information about the expected mean and standard deviation for data at the level of the observation, Xi, we could happily incorporate this information like so:

θi∼Normal(f(Xi),g(xi))

For instance, a normal linear model has f(Xi)=Xβ,g(Xi)=σ, but we could use a variety of functional forms for both.

In such a case, we can always reparameterize θi as

θi=f(Xi)+g(Xi)zi, where zi∼Normal(0,1)

In Stan, we’d implement this by declaring

z in the parameter block, and

θ in the transformed parameter block. We then provide distributional information about

z in the modeling block but typically use

θ in the likelihood. For example:

```
// ... Your data declaration here
parameters {
vector[N] z;
// parameters of f() and g()
}
transformed parameters {
vector[N] theta;
theta = f(X) + g(X)*z; // f() and g() are vector valued functions that probably have parameters
}
model {
// priors
z ~ normal(0, 1);
// likelihood
// ..
}
```

Reparameterizing a covariance matrix

I always found covariance matrices tricky to think about *until* I realized how easily they can be reparameterized. We all know how to think about the standard deviation of a random variable. If the growth rate of GDP has a standard deviation of 1%, then that’s very intuitive. How about a vector of random variables? If the vector is (GDP, Unemployment, Rainfall) then each of those random variables has its own standard deviation. Let’s call it τ=(σGDP,σUnemp,σRain)′. Easy!

Now those random variables might move together. We typically measure (linear) mutual information of a vector of random variables using a *correlation matrix*, Ω. The diagonal of a correlation matrix is 1 (all variables are perfectly correlated with themselves). and the off-diagonals are between -1 and 1, reflecting the correlation coefficient between each random variable. It it symmetric. For instance, the element Ω3,1=Ω1,3 is the correlation between GDP and rainfall.

We have everything we need now. The covariance matrix Σ is simply:

Σ=diag(τ)Ωdiag(τ)

The cool thing about this parameterization is that often our likelihood calls for a covariance matrix (for instance, if we were jointly modeling the three random variables), but we find it easier to provide prior information about the (marginal) scale and correlation between the variables.

We’d implement this in Stan by declaring τ and Ω as parameters, then Σ as a transformed parameter. We then provide priors for τ and Ω, but use Σ in the likelihood. We often use the LKJ distribution as a prior for correlation matrices.

```
parameters {
vector<lower = 0>[3] tau;
corr_matrix[3] Omega;
}
transformed parameters {
matrix[3, 3] Sigma;
Sigma = diag_matrix(tau)*Omega*diag_matrix(tau);
}
model {
// priors
tau ~ student_t(3, 0, 2);
Omega ~ lkj_corr(4);
// likelihood
// expression involving Sigma
}
```

Reparameterizing multivariate normals

You’ll notice in the reparameterization yi∼Normal(μ,σ)⟹yi=μ+σzi for some zi∼Normal(0,1) that σ is the square root of the variance of y. Multivariate normal distributions are typically parameterized in terms of their *variance covariance* matrix, which is the analog to the variance of a univariate normal. But if we want to apply our intuition from the above reparameterization, we need the “square root” of this covariance matrix.

There are many such “square roots” of positive definite matrices; one is the Cholesky factorization

Σ=LL′

where L is a lower triangular matrix with the same dimensions as

Σ. If we have such an

L, we can very easily apply the reparameterization at the top. For some vector

Θi∼Multi Normal(μμ,Σ)

then we can also say that

Θi=μμ+Lzzi where zzi∼Normal(0,1)

Another convenient take on this is to use the fact that if

Σ=diag(τ)Ωdiag(τ)=diag(τ)LΩL′Ωdiag(τ)

where LΩ is the Cholesky factor of the correlation matrix, then we can use the parameterization

Θi=μμ+diag(τ)LΩzzi where zzi∼Normal(0,1)

This parameterization requires less fiddling than the one above. Stan also gives us an LKJ prior distribution for the Cholesky factors of correlation matrix

In Stan, we implement this by declaring μμ τ, zz and LΩ as parameters, and Θ as a transformed parameter.

```
parameters {
vector[K] mu;
vector<lower = 0>[K] tau;
vector[N] z[K];
cholesky_factor_corr[K] L_Omega;
}
transformed parameters {
matrix[K, K] L;
vector[N] Theta[K];
L = diag_pre_multiply(tau, L_Omega);
for(n in 1:N) {
Theta[n] = mu + L * z[n];
}
}
model {
mu ~ normal(0, 1);
tau ~ student_t(3, 0, 2);
for(n in 1:N) {
z[n] ~ normal(0, 1);
}
L_Omega ~ lkj_corr_cholesky(4);
// likelihood below, depending on Theta
}
```

There are of course many other reparameterizations that we use in buiding models, but I tend to use these three daily.