Modern Statistical Workflow: July 2017

It’s often convenient to make use of reparameterizations in our models. Often we want to be able to express parameters in a way that makes coming up with a prior more intuitive. Other times we want to use reparameterizations to improve sampling performance during estimation. In any case, the below are some very simple reparameterizations that we often use in applied modeling.

Reparametrizaing a univariate normal random variable

Sometimes we have a normally distributed unknown

θ_{i}

with mean (location)

μ

and standard deviation (scale)

σ

. We write this as

θ i \sim Normal (μ, σ)

If we have information about the expected mean and standard deviation for data at the level of the observation,

X_{i}

, we could happily incorporate this information like so:

θ i \sim Normal (f (X i), g (x i))

For instance, a normal linear model has

f (X_{i}) = X β, g (X_{i}) = σ

, but we could use a variety of functional forms for both.

In such a case, we can always reparameterize

θ_{i}

θ i = f (X i) + g (X i) z i, where z i \sim Normal (0, 1)

In Stan, we’d implement this by declaring

z

in the parameter block, and

θ

in the transformed parameter block. We then provide distributional information about

z

in the modeling block but typically use

θ

in the likelihood. For example:

// ... Your data declaration here
parameters {
  vector[N] z;
  // parameters of f() and g()
}
transformed parameters {
  vector[N] theta;
  theta = f(X) + g(X)*z; // f() and g() are vector valued functions that probably have parameters
}
model {
  // priors
  z ~ normal(0, 1);
  
  
  // likelihood
  // ..
}

Reparameterizing a covariance matrix

I always found covariance matrices tricky to think about until I realized how easily they can be reparameterized. We all know how to think about the standard deviation of a random variable. If the growth rate of GDP has a standard deviation of 1%, then that’s very intuitive. How about a vector of random variables? If the vector is (GDP, Unemployment, Rainfall) then each of those random variables has its own standard deviation. Let’s call it

τ = {(σ_{GDP}, σ_{Unemp}, σ_{Rain})}^{'}

. Easy!

Now those random variables might move together. We typically measure (linear) mutual information of a vector of random variables using a correlation matrix,

Ω

. The diagonal of a correlation matrix is 1 (all variables are perfectly correlated with themselves). and the off-diagonals are between -1 and 1, reflecting the correlation coefficient between each random variable. It it symmetric. For instance, the element

Ω_{3, 1} = Ω_{1, 3}

is the correlation between GDP and rainfall.

We have everything we need now. The covariance matrix

Σ

is simply:

Σ = diag (τ) Ω diag (τ)

The cool thing about this parameterization is that often our likelihood calls for a covariance matrix (for instance, if we were jointly modeling the three random variables), but we find it easier to provide prior information about the (marginal) scale and correlation between the variables.

We’d implement this in Stan by declaring

τ

and

Ω

as parameters, then

Σ

as a transformed parameter. We then provide priors for

τ

and

Ω

, but use

Σ

in the likelihood. We often use the LKJ distribution as a prior for correlation matrices.

parameters {
  vector<lower = 0>[3] tau;
  corr_matrix[3] Omega;
}
transformed parameters {
  matrix[3, 3] Sigma;
  Sigma = diag_matrix(tau)*Omega*diag_matrix(tau);
}
model {
  // priors
  tau ~ student_t(3, 0, 2);
  Omega ~ lkj_corr(4);
  
  // likelihood
  // expression involving Sigma
}

Reparameterizing multivariate normals

You’ll notice in the reparameterization

y_{i} \sim Normal (μ, σ) ⟹ y_{i} = μ + σ z_{i}

for some

z_{i} \sim Normal (0, 1)

that

σ

is the square root of the variance of

y

. Multivariate normal distributions are typically parameterized in terms of their variance covariance matrix, which is the analog to the variance of a univariate normal. But if we want to apply our intuition from the above reparameterization, we need the “square root” of this covariance matrix.

There are many such “square roots” of positive definite matrices; one is the Cholesky factorization

Σ = L L'

where L is a lower triangular matrix with the same dimensions as

Σ

. If we have such an

L

, we can very easily apply the reparameterization at the top. For some vector

Θ i \sim Multi Normal (μ μ, Σ)

then we can also say that

Θ i = μ μ + L z z i where z z i \sim Normal (0, 1)

Another convenient take on this is to use the fact that if

Σ = diag (τ) Ω diag (τ) = diag (τ) L Ω L' Ω diag (τ)

where

L_{Ω}

is the Cholesky factor of the correlation matrix, then we can use the parameterization

Θ i = μ μ + diag (τ) L Ω z z i where z z i \sim Normal (0, 1)

This parameterization requires less fiddling than the one above. Stan also gives us an LKJ prior distribution for the Cholesky factors of correlation matrix

In Stan, we implement this by declaring

μ μ

τ

z z

and

L_{Ω}

as parameters, and

Θ

as a transformed parameter.

parameters {
  vector[K] mu;
  vector<lower = 0>[K] tau;
  vector[N] z[K];
  cholesky_factor_corr[K] L_Omega;
}
transformed parameters {
  matrix[K, K] L;
  vector[N] Theta[K];
  L = diag_pre_multiply(tau, L_Omega);

  for(n in 1:N) {
    Theta[n] = mu + L * z[n];
  }
}
model {
  mu ~ normal(0, 1);
  tau ~ student_t(3, 0, 2);
  for(n in 1:N) {
    z[n] ~ normal(0, 1);
  }
  L_Omega ~ lkj_corr_cholesky(4);
  
  // likelihood below, depending on Theta
}

There are of course many other reparameterizations that we use in buiding models, but I tend to use these three daily.

Modern Statistical Workflow

Sunday, July 16, 2017

A few simple reparameterizations

Jim Savage

7/16/2017