Conditional distributions for the linear Gaussian model

A linear Gaussian model is a model where (i) the input variables are jointly Gaussian, and (ii) the output variables are also jointly Gaussian, having means which are linear combinations of the input variables (and possibly a bias term). Linear Gaussian models are popular because in this model, several key quantities end up having Gaussian distribution with parameters which we can compute. Here is the main theorem:

Theorem. Assume that

\begin{aligned} p(x) &= \mathcal{N}\left( x \mid \mu, \Lambda^{-1} \right) , \\  p(y \mid x) &= \mathcal{N} \left( y \mid Ax + b, L^{-1} \right). \end{aligned}

Then

\begin{aligned} p(y) &= \mathcal{N}\left( y \mid A\mu + b, L^{-1} + A \Lambda^{-1}A^\top \right) , \\  p(x \mid y) &= \mathcal{N} \left( x \mid \Sigma \left[ A^\top L (y - b) + \Lambda \mu \right], \Sigma \right), \end{aligned}

where

\begin{aligned} \Sigma = \left( \Lambda + A^\top L A \right)^{-1} \end{aligned}

The details of the proof are a bit involved but the overall idea is quite simple. If you compute the log of the joint distribution p(x, y) = p(x) p(y\mid x), you will find that \log p(y) is a quadratic expression in y, which implies that (x, y) has Gaussian distribution. Some algebra on the expression, along with the matrix inversion formula, gives the mean and covariance matrix of the joint distribution. We can pick out the relevant terms for the marginal distribution of y.

The result for the conditional distribution x\mid y follows directly from the result in this previous post.

See Section 2.3.3 of Bishop (2006) (Reference 1) for the gory details.

References:

  1. Bishop, C. M. (2006). Pattern Recognition and Machine Learning (Section 2.3.3).

1 thought on “Conditional distributions for the linear Gaussian model

  1. Pingback: Using the Laplace method to approximate the posterior distribution of neural network parameters and outputs | Statistical Odds & Ends

Leave a comment