Conjugate Gaussian Models

These notes are really, really old!

My introduction to statistics, linear algebra, and programming was Kevin Murphy’s Conjugate Bayesian Analysis of the Gaussian Distribution, and the following are the notes I took to digest his document back then. I thought about deleting these, but a lot of students ask me to keep them up.

There are probably some remaining mistakes, so if something is unclear, get in touch with me!


Preliminaries

Concepts

Joint Probability (assuming Statistical Independence)

After watching the same movie, Matt is 10% likely to rate the movie an 8 while Sarah is 20% likely. What is the probability that both Matt and Sarah rate it an 8? If you answered 2%, then you assumed statistical independence:

\begin{align} p(\mu = 8, \sigma = 8) &= \prior{\mu = 8} \times \prior{\sigma = 8} \\ &= .1 \times .2 \\ &= .02. \end{align}

$p(\mu = 8, \sigma = 8)$ is called the joint probability. A little more formally, two random variables $\mu$ and $\sigma$ are statistically independent if the following equality is always true, for all values they can take on:

$$\circleEquation{ p({\color{group1}\mu}, {\color{group2}\sigma}) = \prior{\color{group1}\mu} \times \prior{\color{group2}\sigma} }.$$

If there is statistical dependence between $\mu$ and $\sigma$, then $p(\mu = 8, \sigma = 8) \neq .02$, which requires an alternative way to compute the joint probability.

Conditional Probability

Suppose Matt and Sarah are 10% and 20% likely to rate a movie an 8, respectively, but Matt is a copycat: If Sarah rates it an 8, Matt is 40% likely to rate the movie an 8.

Conditional distributions such as $\circleEquation{ \likelihood{\color{group1}\mu}{\color{group2}\sigma} }$ specify different distributions depending on the conditions:

• $\likelihood{\mu = 8}{\sigma = 8} = .4$ is the (likelihood) probability that Matt rates the movie an 8 if Sarah does.
• $\likelihood{\mu = 8}{\sigma \neq 8} = .02$ is the (likelihood) probability that Matt rates the movie an 8 if Sarah does not.

$\prior{\mu} = .1$ is the overall probability that Matt is likely to rate the movie an 8.

Joint Probability (regardless of Statistical Independence)

If we know the likelihood $\likelihood{\mu = 8}{\sigma = 8}$ of Matt rating the movie an 8 when Sarah rates it an 8, and the probability $\prior{\sigma = 8}$ of Sarah rating it an 8 in the first place, we can compute the joint probability correctly, whether or not Matt’s response depends on Sarah’s:

$$\circleEquation{ p({\color{group1}\mu}, {\color{group2}\sigma}) = \likelihood{\color{group1}\mu}{\color{group2}\sigma} \ \prior{\color{group2}\sigma} }.$$

Example: If $\likelihood{\mu = 8}{\sigma = 8} = \frac{2}{5}$ and $\prior{\sigma = 8} = \frac{1}{5}$, then $p(\mu = 8, \ \sigma = 8) = \frac{2}{25}$.

It might help to think of probabilities as the fraction of total space they occupy: Below, the total amount of area shared between the red and gray ovals is equal to the area of the red times the percent of the red shared with the gray.

1. Let’s say the rectangular figure below has an area of 1, representing all of the possible outcomes.
2. If we know the red region has $\frac{1}{5}$ of the area of the rectangle, the area of the red region is $1 \times \frac{1}{5} = \frac{1}{5}$.
3. Further, if we know $\frac{2}{5}$ of the red space is shared with the gray, the total amount of space occupied by both is the total amount of red times $\frac{2}{5}$. Plugging in, we have $\frac{1}{5} \times \frac{2}{5} = \frac{2}{25} = .08$.

Bayes’ Rule

Let’s say we meet one of Matt’s friends who tells us that Matt thought the movie was pretty good. To get a clearer idea of whether this the movie is worth watching, we ask this friend how Matt might rate it on a 0-10 scale.

Bayes’ Rule tells us how to use this friend’s guess to update our belief about how Matt might rate the movie. Essentially, it exploits the fact that joint probabilities can be computed in two different ways.

$p({\color{group1}x}, {\color{group2}\mu}) = \likelihood{\color{group1}x}{\color{group2}\mu} \times \prior{\color{group2}\mu}$:

• $\prior{\mu = 8}$ is our confidence that Matt would rate the movie an 8 when the mutual friend says Matt thought the movie was "pretty good".
• $\likelihood{x = 8}{\mu = 8}$ is the probability that this friend guesses 8, given Matt would have rated it an 8.

$p({\color{group1}x}, {\color{group2}\mu}) = \likelihood{\color{group2}\mu}{\color{group1}x} \times \prior{\color{group1}x}$:

• $\prior{x = 8}$ is the overall probability this friend would guess 8, regardless of how Matt rates the movie.
• $\likelihood{\mu = 8}{x = 8}$ is the likelihood that that Matt would rate the movie an 8 if the friend guesses 8.
• This is the thing we need to calculate. It tells us how confident we can be that Matt would rate the movie an 8 after his friend told us he thinks Matt would rate it an 8.
• To make things easier to track later on, let’s give these unique subscripts:
• $p_{\color{red} 0}(x) = p_{\color{red} m}(x)$ and
• $p_{\color{red} \ell} (\mu \given x) = p_{\color{red} n} (\mu \given x)$ .

Bayes rule is derived by setting the two ways of computing the joint probability equal to each other:

$$\posterior{\color{group2} \mu }{\color{group1} x } \ \marginalLikelihood{\color{group1} x} = p(x, \mu) = \likelihood{\color{group1} x }{\color{group2} \mu } \ \prior{\color{group2} \mu} \\ \posterior{\color{group2} \mu }{\color{group1} x } \ \marginalLikelihood{\color{group1}x} = \likelihood{\color{group1} x }{\color{group2} \mu } \ \prior{\color{group2} \mu} \\ \circleEquation{ \posterior{\color{group2} \mu }{\color{group1} x } = \frac{ \likelihood{\color{group1} x}{\color{group2} \mu} \ \prior{\color{group2} \mu} }{\marginalLikelihood{\color{group1} x} } }.$$
• $\likelihood{x = 8}{\mu = 8}$ is called the likelihood probability: probability that $x = 8$ given $\mu = 8$.
• $\prior{\mu = 8}$ is called the prior probability: overall probability that $\mu = 8$.
• $\posterior{\mu = 8}{x = 8}$ is called the posterior probability: probability that $\mu = 8$ given $x = 8$.
• $\marginalLikelihood{x = 8}$ is called the marginal probability aka (the evidence): overall probability that $x = 8$.
Conditional Independence

After Matt tells three of our mutual friends he liked the movie, we ask these friends how Matt might rate the movie out of 10. We will only watch the movie if we feel confident that Matt would rate it >8.

This means the guesses $x_1, x_2, x_3$ depend on what Matt might have rated the movie: $\likelihood{x_i}{\mu}$.

Assuming Matt told all of his friends the same thing, the guesses our mutual friends make are independent of one another, allowing us to multiply conditional probabilities:

• If Matt would rate the movie an 8, each of his friends are 50% likely to report a 7.
• Then, the probability of all of them reporting a 7, given that Matt would rate the movie an 8 is $\circleEquation{ \likelihood{\color{group1} x_1, x_2, x_3 }{\color{group2} \mu} = \likelihood{\color{group1} x_1}{\color{group2} \mu} \times \likelihood{\color{group1} x_2}{\color{group2} \mu} \times \likelihood{\color{group1} x_3}{\color{group2} \mu}} = .5 \times .5 \times .5 = .125$ .

The idea is the same as multiplying overall probabilities under statistical independence: If the probabilities are statistically independent given some information, we can multiply the conditional probabilities to obtain the joint conditional probability $\likelihood{x_1, x_2, x_3}{\mu}$.

Gaussian Distribution

The main thing to take away here is what the distribution looks like and how its shape is controlled:

1. The mean has the highest probability density,
2. "flatness" or "width" is controlled by the (co)variance parameter, and
3. the normalizing constant outside of the exponentiation ensures that the area under the Gaussian is always 1.
Concept Univariate (1-Dimensional) Multivariate (>1-Dimensional)
Mean $\mu$ $\bm{\mu}$
(Co)variance $\sigma$ $\bm{\Sigma}$
Normalizing
Constant
$\frac{1}{\sqrt{2 \pi \sigma^2}}$ $\frac{1}{(2 \pi)^{D / 2}\begin{vmatrix}\bm{\Sigma}\end{vmatrix}^{1/2}}$

Univariate Equation

$$\normalAbbreviation{x}{\mu}{\sigma^2} = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{ \left\{ \frac{- 1}{2 \sigma^2} (x - \mu)^2 \right\}}$$

Multivariate Equation

$$\normalAbbreviation{\bm{x}}{\bm{\mu}}{\bm{\Sigma}} = \frac{1}{(2 \pi)^{D / 2} \begin{vmatrix} \bm{\Sigma} \end{vmatrix}^{1/2}} e^{ \left\{ \frac{-1}{2} (\bm{x} - \bm{\mu})^{\top}) \ \bm{\Sigma}^{-1} \ (\bm{x} - \bm{\mu}) \right\}},$$

where $\bm{\Sigma}$ is a symmetric and positive definite matrix. You definitely need the symmetry property to make the multivariate derivations go anywhere. You may get away with not knowing what "positive definite" means, depending on what you are trying to get out of the notes.

Notation

Regular vs boldface font
• $\mu$ is a number such as 0.5, but $\bm{\mu}$ is a vector of numbers such as

$$\begin{bmatrix} -.25 & .1 & .32 & 5 \end{bmatrix}.$$
• $\sigma^2$ is a number such as 2, but $\bm{\Sigma}$ is a matrix of numbers such as

$$\begin{bmatrix} 1 & 2 & 3 \\ 2 & 5 & 6 \\ 3 & 6 & 9 \end{bmatrix}.$$
Probability vs. probability distribution
• Moving forward, if I follow a word for a distribution with the word "probability", I am referring to an actual number you obtain by plugging into the function.
• Prior probability
• Posterior probability
• Likelihood probability
• Marginal likelihood probability
• Posterior predictive probability
• If I do not follow distribution names with the word "probability", I am referring to the whole distributions or functions:
• "Prior" means "prior distribution".
• "Posterior" means "posterior distribution".
• "Likelihood" means "likelihood function".
• "Marginal likelihood" means "marginal likelihood function/distribution".
• "Posterior predictive" means "posterior predictive distribution".
$\exp{\{x\}} := e^x$
• I use $\exp{\{x\}}$ instead of $e^x$ because the larger font makes it easier to see what is going on inside of the exponent.
$\mathfrak{D}(\Theta)$ represents the space of possible values $\Theta$ can take on.
• In terms of die rolls, $\mathfrak{D}(roll_i) = \{1, 2, 3, 4, 5, 6\}$, the space of possible outcomes for a single die roll.
• Here, in the univariate case, $\Theta = \{\mu, \sigma^2\}$, where $\mu \in (-\infty, \infty)$ and $\sigma^2 \in [0, +\infty)$.
• Hence, $\mathfrak{D}(\Theta) = \mathbb{R} \times \mathbb{R}_{\geq 0}$ , where $\mathbb{R}_{\geq 0} = \{x \in \mathbb{R} \vert x \geq 0\}$ .
• In the multivariate case, $\Theta = \{\bm{\mu}, \bm{\sigma^2}\}$, $\bm{\mu} \in \mathbb{R}^m$, and $\bm{\Sigma} \in \mathbb{R}^{m \times m}$ , where $m$ is the dimensionality of the data, so $\mathfrak{D}(\Theta) = \mathbb{R}^m \times \mathbb{R}^{m \times m}.$

My notation is slightly different from what you will typically find in papers — here is a table for reference.

Prior Posterior Likelihood Marginal
Likelihood
Posterior
Predictive
$\prior{\Theta}$ $\posterior{\Theta}{x}$ $\likelihood{x}{\Theta}$ $\marginalLikelihood{x}$ $\posteriorPredictive{x_*}{x}$
$p(\Theta)$ $p(\Theta \given x)$ $p(x \given \Theta)$ $p(x)$ $p(x_* \given x)$

Objectives

To keep things in perspective, it helps to have end goals when working through math and code details.

Here, we are going to try to estimate what Matt thinks of a new movie he watched based on what he tells his friends. Since we don’t know Matt and none of our mutual friends have seen the movie yet, we ask these mutual friends what Matt thought of the movie. They say Matt thinks it is "pretty good", but that is not convincing, so we ask them how Matt might have rated the movie on a 0-10 scale. Using these data, we try to update our belief about how Matt would rate the movie.

After this, we want to infer the underlying distribution of the data we collect. We assume it is roughly Gaussian with a mean, capturing information about how much Matt truly enjoys the movie, and a variance parameter, capturing information about the variability or reliability of the guesses.

Approach

Step 1: Assume the true likelihood distribution is Gaussian, with parameters $\Theta$.
• If we are working with one dimensional data, the parameters are a mean and a variance: $\Theta = \{ {\color{group2} \mu_{\ell}, \sigma_{\ell}^2} \}$ .
• If we are working with >1-dimensional data, the parameters are a mean vector and a covariance matrix: $\Theta = \{ {\color{group2} \bm{\mu}_{\ell}, \bm{\Sigma}_{\ell}} \}$ .
• Of course, assuming Gaussianity is a strong assumption, but we are just doing exercises here.
Step 2: Assume observed data $x_1, x_2, \dots, x_n$ are conditionally independent given the true Gaussian $\Theta$:
\begin{align} \likelihood{\color{group1} x_1, x_2 \dots, n }{\color{group2} \Theta } &= \likelihood{\color{group1} x_1}{\color{group2} \Theta} \ \times \ \likelihood{\color{group1} x_2}{\color{group2} \Theta} \ \times \ \dots \ \times \ \likelihood{\color{group1} x_n}{\color{group2} \Theta} \\ &= \prod\limits_i^n \likelihood{ \color{group1} x_i }{\color{group2} \Theta } \\ &= \prod\limits_i^n {\color{red} \mathcal{N}} ({\color{group1} x_i} \given {\color{group2} \Theta}). \end{align}
Step 3: Put prior(s) on parameters you do not know (i.e. specify your certainty about their potential values).
• For some of the models, either the mean or the covariance is assumed to be known (and therefore constant), but for others, both are assumed to be unknown.
$$\likelihood{x_1, x_2, \dots, x_n}{\color{group2} \Theta} \times {\color{group2} \prior{\Theta}}$$
Step 4: Obtain the posterior using Bayes’ rule (i.e. update your certainty about the potential values of $\Theta$.
• Just divide both sides by the marginal likelihood of the data.
$$\posterior{ \color{group2} \Theta }{\color{group1} x_1, x_2, \dots, x_n } = \frac{ \likelihood{ \color{group1} x_1, x_2, \dots, x_n }{\color{group2} \Theta } \ \prior{\color{group2} \Theta } }{\marginalLikelihood{\color{group1} x_1, x_2, \dots, x_n} }$$
• The marginal likelihood is obtained through marginalization of model parameters $\Theta$, which is just a fancy version of weighted summing of the likelihood probabilitites. It is analogous to computing a final grade for a course.
$$\marginalLikelihood{\color{group1} x_1, x_2, \dots, x_n } = \int_{\mathfrak{D}({\color{group2} \Theta})} \likelihood{\color{group1} x_1, x_2, \dots, x_n }{\color{group2} \Theta } \ \prior{\color{group2} \Theta} d{\color{group2} \Theta}$$
• For our purposes, solving this integral only requires a few algebra tricks.
Step 5: Using the likelihood and posterior, derive the posterior predictive—the distribution of interest.
• Multiply the posterior and likelihood, and then integrate (i.e. marginalize) unknown parameters away.
• Again, this is just a fancy version of weighted summing of the likelihood probabilities.
• Recall that it is analogous to computing a final grade for a course.
• The result is a formula for predicting new data points $x_*$:
$$\posteriorPredictive{ \color{group4} x_* }{\color{group1} x_1, x_2, \dots, x_n } = \int_{\mathfrak{D}({\color{group2} \Theta})} \likelihood{\color{group4} x_*}{\color{group2} \Theta} \ \posterior{\color{group2} \Theta }{\color{group1} x_1, x_2, \dots, x_n } d{\color{group2} \Theta}.$$
• If we have enough data, the mean of this distribution should be close to or the same as the true mean (i.e. how Matt would have actually rated the move on a 0-10 scale).

Conjugate Gaussian Models

Model naming is based on the priors on the likelihood Gaussian. E.g. the ${\color{group5} Normal}-{\color{group6} Known}$ model puts a Gaussian prior on the ${\color{group5} mean}$ of the likelihood and assumes the ${\color{group6} (co)variance}$ of the likelihood is already known.

1D: Normal-Known

Likelihood of the Training Data
\begin{align} p_{\color{group3} \ell} (x_1, x_2, \dots, x_n \given {\color{group6} \Theta}) &= \prod\limits_i^n \normalAbbreviation{x_i }{\color{group6} \mu_{\color{group3} \ell} }{\color{group6} \sigma_{\color{group3} \ell}^2 } \tag{1} \label{normalKnownLikelihood1}\\ &= \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} {\color{group1} \sum\limits_i^n (x_i - \mu_{\ell})^2 } \right\} } \tag{2} \label{normalKnownLikelihood2} \\ &= \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} {\color{group1} \sum\limits_i^n \left[ (x_i - \bar{x}) - (\mu_{\ell} - \bar{x})) \right]^2 } \right\}} \tag{3} \label{normalKnownLikelihood3} \\ &= \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{\left\{ \frac{-1}{2 \sigma_{\ell}^2} \left[ {\color{group2} ns^2} {\color{group5} + n(\bar{x} - \mu_{\ell})^2 } \right] \right\} } \tag{4} \label{normalKnownLikelihood4} \\ &= \circleEquation{ \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n {\color{group2} \exp{ \left\{ \frac{-ns^2}{2\sigma_{\ell}^2} \right\} } } {\color{group5} \left( \sqrt{\frac{2 \pi \sigma_{\ell}^2}{n}} \right) \normalAbbreviation{\bar{x} }{\mu_{\ell} }{\frac{\sigma_{\ell}^2}{n} } } } \tag{5} \label{normalKnownLikelihood5} \\ &\propto \normalAbbreviation{\bar{x}}{\mu_{\ell}}{\frac{\sigma_{\ell}^2}{n}} \tag{6} \label{normalKnownLikelihood6} \\ \end{align}
Equation 1: Assume the data is conditionally independent under a Gaussian.

From conditional independence, the (joint) likelihood of data is equal to the product of the individual likelihoods:

$$\likelihood{ {\color{group1} x_1}, {\color{group1} x_2}, \dots, {\color{group1} x_n} }{\color{group2} \Theta } = \likelihood{\color{group1} x_1}{\color{group2} \Theta} \times \likelihood{\color{group1} x_2}{\color{group2} \Theta} \times \dots \times \likelihood{\color{group1} x_n}{\color{group2} \Theta}.$$

Since we assume the individual likelihoods are Gaussian, ${\color{group2} p_{\ell}}(x_i \given \Theta) = {\color{group2} \mathcal{N}} (x_i \given \mu_{\color{group2} \ell}, \sigma_{\color{group2} \ell}^2)$:

$${\color{group2} p_{\ell}}(x_i \given \Theta) \times {\color{group2} p_{\ell}}(x_2 \given \Theta) \times \dots \times {\color{group2} p_{\ell}}(x_n \given \Theta) \\ = {\color{group2} \mathcal{N}} (x_1 \given \mu_{\color{group2} \ell}, \sigma_{\color{group2} \ell}^2) \times {\color{group2} \mathcal{N}} (x_2 \given \mu_{\color{group2} \ell}, \sigma_{\color{group2} \ell}^2) \times \dots \times {\color{group2} \mathcal{N}} (x_n \given \mu_{\color{group2} \ell}, \sigma_{\color{group2} \ell}^2).$$

Rewrite the product using ${\color{group5} \prod}$ notation:

$$\normalAbbreviation{\color{group5} x_1 }{\mu_{\ell} }{\sigma_{\ell}^2 } {\color{group5} \times} \normalAbbreviation{\color{group5} x_2 }{\mu_{\ell} }{\sigma_{\ell}^2 } {\color{group5} \times} \dots {\color{group5} \times} \normalAbbreviation{\color{group5} x_n }{\mu_{\ell} }{\sigma_{\ell}^2 } \\ = {\color{group5} \prod\limits_i^n} \normalAbbreviation{\color{group5} x_i}{\mu_{\ell}}{\sigma_{\ell}^2}.$$
Equation 2: Simplify constants and exponents in the product.

Write out the product by plugging into the Gaussian expression.

$$\prod\limits_i^n \normalAbbreviation{\color{group1} x_i }{\color{group2} \mu_{\ell} }{\color{group2} \sigma_{\ell}^2 } = \univariateGaussianEQ{ \color{group1} x_1 }{\color{group2} \mu_{\ell} }{\color{group2} \sigma_{\ell}^2 } \times \univariateGaussianEQ{ \color{group1} x_2 }{\color{group2} \mu_{\ell} }{\color{group2} \sigma_{\ell}^2 } \times \dots \times \univariateGaussianEQ{ \color{group1} x_n }{\color{group2} \mu_{\ell} }{\color{group2} \sigma_{\ell}^2 }$$

Bring the $n$ normalizing constants to the front.

$${\color{group5} \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} } \exp{ \left\{ \frac{-1 }{2\sigma_{\ell}^2 } (x_1 - \mu_{\ell})^2 \right\} } \times {\color{group5} \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} } \exp{ \left\{ \frac{-1 }{2\sigma_{\ell}^2 } (x_2 - \mu_{\ell})^2 \right\} } \times \dots \times {\color{group5} \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} } \exp{ \left\{ \frac{-1 }{2\sigma_{\ell}^2 } (x_n - \mu_{\ell})^2 \right\} } \\ = {\color{group5} \left( \frac{1 }{\sqrt{2 \pi \sigma_{\ell}^2} } \right)^n } \exp{ \left\{ \frac{-1 }{2\sigma_{\ell}^2 } (x_1 - \mu_{\ell})^2 \right\} } \times \dots \times \exp{ \left\{ \frac{-1 }{2\sigma_{\ell}^2 } (x_n - \mu_{\ell})^2 \right\} }$$

Sum exponents. Recall that $a^b {\color{group5} \times} a^c = a^{b {\color{group5} +} c}$ .

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1 }{2\sigma_{\ell}^2 } (x_1 - \mu_{\ell})^2 \right\} } {\color{group5} \times} \exp{ \left\{ \frac{-1 }{2\sigma_{\ell}^2 } (x_2 - \mu_{\ell})^2 \right\} } {\color{group5} \times} \dots {\color{group5} \times} \exp{ \left\{ \frac{-1 }{2\sigma_{\ell}^2 } (x_n - \mu_{\ell})^2 \right\} } \\ = \left( \frac{1 }{\sqrt{2 \pi \sigma_{\ell}^2} } \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} (x_1 - \mu_{\ell})^2 {\color{group5} +} \frac{-1}{2 \sigma_{\ell}^2} (x_2 - \mu_{\ell})^2 {\color{group5} +} \dots {\color{group5} +} \frac{-1}{2 \sigma_{\ell}^2} (x_n - \mu_{\ell})^2 \right\} }$$

Factor out ${\color{group5} \frac{-1}{2\sigma_{\ell}^2}}$ within the exponent.

$$\left( \frac{1 }{\sqrt{2 \pi \sigma_{\ell}^2} } \right)^n \exp{\left\{ {\color{group5} \frac{-1}{2 \sigma_{\ell}^2}} (x_1 - \mu_{\ell})^2 + {\color{group5} \frac{-1}{2 \sigma_{\ell}^2}} (x_2 - \mu_{\ell})^2 + \dots + {\color{group5} \frac{-1}{2 \sigma_{\ell}^2}} (x_n - \mu_{\ell})^2 \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ {\color{group5} \frac{-1}{2 \sigma_{\ell}^2}} \left[ (x_1 - \mu_{\ell})^2 + (x_2 - \mu_{\ell})^2 + \dots + (x_n - \mu_{\ell})^2 \right] \right\} }$$

Represent the sum using ${\color{group5} \sum}$ notation.

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \left[ (x_{\color{group5} 1} - \mu_{\ell})^2 {\color{group5} +} (x_{\color{group5} 2} - \mu_{\ell})^2 {\color{group5} +} \dots {\color{group5} +} (x_{\color{group5} n} - \mu_{\ell})^2 \right] \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} {\color{group5} \sum\limits_{i}^n} (x_{\color{group5} i} - \mu_{\ell})^2 \right\} }$$
Equation 3: Introduce the empirical mean $\bar{x}$.

Introduce 0:

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n {\color{group5}(x_i - \mu_{\ell})}^2 \right\} } = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n {\color{group5} (x_i + 0 - \mu_{\ell})}^2 \right\} }.$$

Since $0 = - \bar{x} + \bar{x}$, where $\bar{x} = \frac{1}{n}\sum\limits_i^n x_i$, we can introduce the empirical mean the following way:

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n (x_i + {\color{group5} 0} - \mu_{\ell})^2 \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n \left( x_i - {\color{group5}\bar{x} + \bar{x}} - \mu_{\ell} \right)^2 \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n \left[ (x_i - \bar{x}) + (\bar{x} - \mu_{\ell}) \right]^2 \right\} }.$$

Since $\bar{x} - \mu_{\ell} = - \mu_{\ell} + \bar{x} = - (\mu_{\ell} - \bar{x})$ ,

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n \left[ (x_i - \bar{x}) {\color{group5} + (\bar{x} - \mu_{\ell})} \right]^2 \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n \left[ (x_i - \bar{x}) {\color{group5} - (\mu_{\ell} - \bar{x})} \right]^2 \right\} }.$$
Equation 4: Introduce the empirical variance $s^2$.

Expand the quadratic expression inside of the summation. Recall that $({\color{group1}a} - {\color{group2}b})^2 = {\color{group1}a}^2 - 2{\color{group1}a}{\color{group2}b} + {\color{group2}b}^2$. Here, ${\color{group1}a = (x_i - \bar{x})}$, ${\color{group2}b = (\mu_{\ell} - \bar{x})}$:

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n \left[ {\color{group1} (x_i - \bar{x}) } - {\color{group2} (\mu_{\ell} - \bar{x}) } \right]^2 \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n \left[ {\color{group1} (x_i - \bar{x}) }^2 - 2 { \color{group1} (x_i - \bar{x}) } { \color{group2} (\mu_{\ell} - \bar{x}) } + {\color{group2} (\mu_{\ell} - \bar{x})}^2 \right] \right\} }$$

Since ${\color{group5}\sum\limits_i^n} a_i + b_i + c_i = {\color{group5} \sum\limits_i^n} a_i + {\color{group5}\sum\limits_i^n} b_i + {\color{group5}\sum\limits_i^n} c_i$ , we obtain

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} {\color{group5} \sum\limits_i^n} \left[ (x_i - \bar{x})^2 - 2 (x_i - \bar{x}) (\mu_{\ell} - \bar{x}) + (\mu_{\ell} - \bar{x})^2 \right] \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \left[ {\color{group5} \sum\limits_i^n} (x_i - \bar{x})^2 - {\color{group5} \sum\limits_i^n} 2(x_i - \bar{x}) (\mu_{\ell} - \bar{x}) + {\color{group5} \sum\limits_i^n} (\mu_{\ell} - \bar{x})^2 \right] \right\} }.$$

If we define the empirical variance as $s^2 = \frac{1}{n}\sum\limits_i^n(x_i - \bar{x})^2$, multiplying both sides by $n$ yields ${\color{group1} ns^2 = \sum\limits_i^n(x_i - \bar{x})^2}$ . We can substitute this for the first term in the exponent:

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \left[ {\color{group1} \sum\limits_i^n (x_i - \bar{x})^2} - \sum\limits_i^n 2(x_i - \bar{x})(\mu_{\ell} - \bar{x}) + \sum\limits_i^n (\mu_{\ell} - \bar{x})^2 \right] \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \left[ {\color{group1} ns^2} - \sum\limits_i^n 2(x_i - \bar{x})(\mu_{\ell} - \bar{x}) + \sum\limits_i^n (\mu_{\ell} - \bar{x})^2 \right] \right\} }.$$

Factor out ${\color{group2}2(\mu_{\ell} - \bar{x})}$ from the second term in the exponent.

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \left[ ns^2 - \sum\limits_i^n {\color{group2} 2} (x_i - \bar{x}) {\color{group2} (\mu_{\ell} - \bar{x})} + \sum\limits_i^n (\mu_{\ell} - \bar{x})^2 \right] \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \left[ ns^2 - {\color{group2} 2 (\mu_{\ell} - \bar{x})} \sum\limits_i^n (x_i - \bar{x}) + \sum\limits_i^n (\mu_{\ell} - \bar{x})^2 \right] \right\} }.$$

Since ${\color{group5} \sum\limits_i^n (x_i - \bar{x}) = x_1 + x_2 + \dots + x_n - n\bar{x} = n\bar{x} - n\bar{x} = 0 },$ the second term in the exponent becomes zero.

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \left[ ns^2 - {\color{group2} 2(\mu_{\ell} - \bar{x})} {\color{group5} \sum\limits_i^n (x_i - \bar{x})} + \sum\limits_i^n (\mu_{\ell} - \bar{x})^2 \right] \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \left[ ns^2 - {\color{group2}2(\mu_{\ell} - \bar{x})} {\color{group5} \times 0} + \sum\limits_i^n (\mu_{\ell} - \bar{x})^2 \right] \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \left[ ns^2 + {\color{group5}0} + \sum\limits_i^n (\mu_{\ell} - \bar{x})^2 \right] \right\} }.$$

Finally, since ${\color{group2} \sum\limits_i^n (\mu_{\ell} - \bar{x})^2} = {\color{group2} n(\mu_{\ell} - \bar{x})^2}$ ,

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \left[ ns^2 + {\color{group2} \sum\limits_i^n (\mu_{\ell} - \bar{x})^2 } \right] \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \left[ ns^2 + {\color{group2} n(\mu_{\ell} - \bar{x})^2} \right] \right\} }.$$
Equation 5: Rewrite the empirical mean as a Gaussian distributed random variable.

Distribute the constant containing the likelihood variance $\frac{-1}{2 \sigma_{\ell}^2}$:

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ {\color{group5} \frac{-1}{2 \sigma_{\ell}^2}} \left[ ns^2 + n(\mu_{\ell} - \bar{x})^2 \right] \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{ns^2}{\color{group5} -2 \sigma_{\ell}^2} + \frac{n}{\color{group5} -2 \sigma_{\ell}^2} (\mu_{\ell} - \bar{x})^2 \right\} }$$

Since $e^{a {\color{group5} +} b} = e^a {\color{group5} \times} e^b$ ,

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{ns^2}{-2 \sigma_{\ell}^2} {\color{group5} +} \frac{n}{-2 \sigma_{\ell}^2} (\mu_{\ell} - \bar{x})^2 \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{ns^2}{-2 \sigma_{\ell}^2} \right\} } {\color{group5} \times} \exp{ \left\{ \frac{n}{-2 \sigma_{\ell}^2} (\mu_{\ell} - \bar{x})^2 \right\} }.$$

Since $\frac{n}{\sigma_{\ell}^2} = \frac{1}{\frac{\sigma_{\ell}^2}{n}}$ , we can rewrite the third expression as a Gaussian with variance $\frac{\sigma_{\ell}^2}{n}$ by introducing $1 = \frac{\sqrt{2 \pi \frac{\sigma_{\ell}^2}{n}} }{\sqrt{2 \pi \frac{\sigma_{\ell}^2}{n}}}$

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{ns^2}{-2 \sigma_{\ell}^2} \right\} } \times \exp{ \left\{ \frac{n}{-2 \sigma_{\ell}^2} (\mu_{\ell} - \bar{x})^2 \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{ns^2}{-2 \sigma_{\ell}^2} \right\} } \times {\color{group5} 1} \times \exp{ \left\{ \frac{n}{-2 \sigma_{\ell}^2} (\mu_{\ell} - \bar{x})^2 \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{ns^2}{-2 \sigma_{\ell}^2} \right\} } \times {\color{group5} \frac{\sqrt{2 \pi \frac{\sigma_{\ell}^2}{n}} }{\sqrt{2 \pi \frac{\sigma_{\ell}^2}{n}} } } \times \exp{ \left\{ \frac{\color{group4} n }{-2 {\color{group4}\sigma_{\ell}^2} } (\mu_{\ell} - \bar{x})^2 \right\} } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{ns^2}{-2 \sigma_{\ell}^2} \right\} } \times {\color{group5} \left( \sqrt{\frac{2 \pi \sigma_{\ell}^2}{n}} \right) \left( \frac{1 }{\sqrt{\frac{2 \pi \sigma_{\ell}^2}{n}} } \right) } \times \exp{ \left\{ \frac{1 }{-2 {\color{group4} \frac{\sigma_{\ell}^2}{n}} } (\mu_{\ell} - \bar{x})^2 \right\} } \\$$

Abbreviating the Gaussian expression,

$$\left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{ns^2}{-2 \sigma_{\ell}^2} \right\} } \times \left( \sqrt{\frac{2 \pi \sigma_{\ell}^2}{n}} \right) {\color{group5} \left( \frac{1 }{\sqrt{\frac{2 \pi \sigma_{\ell}^2}{n}} } \right) \times \exp{ \left\{ \frac{1}{-2 \frac{\sigma_{\ell}^2}{n}} (\mu_{\ell} - \bar{x})^2 \right\} } } \\ = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-ns^2}{2 \sigma_{\ell}^2} \right\} } \times \sqrt{\frac{2 \pi \sigma_{\ell}^2}{n}} \times {\color{group5} \normalAbbreviation{\bar{x} }{\mu_{\ell} }{\frac{\sigma_{\ell}^2}{n} } }.$$
Equation 6: Note the proportionality of the Gaussian to the joint likelihood.

Recall that there is only one variable in this model: $\mu_{\ell}$. Everything else, including $\sigma_{\ell}^2$, $n$, $x_1, x_2, \ \dots, \ x_n$ is known, i.e. constant, which means $s^2 = \frac{1}{n}\sum\limits_i^n(x_i - \bar{x})^2$ is a constant, making the fourth factor directly proportional to the joint likelihood.

\begin{align} {\color{group5} \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{\left\{ \frac{-ns^2}{2 \sigma_{\ell}^2} \right\} } \left( \sqrt{\frac{n}{2 \pi \sigma_{\ell}^2}} \right) } \normalAbbreviation{\bar{x}}{\mu_{\ell}}{\frac{\sigma_{\ell}^2}{n}} &{\color{group5} = c} \times \normalAbbreviation{\bar{x} }{\mu_{\ell} }{\frac{\sigma_{\ell}^2}{n} } \\ &{\color{group5} \propto \ } \normalAbbreviation{\bar{x}}{\mu_{\ell}}{\frac{\sigma_{\ell}^2}{n}} \end{align}
Prior on Likelihood Parameters
\begin{align} \prior{\Theta} &= \prior{\mu_{\ell}, \ \sigma_{\ell}^2} \tag{1} \label{normalKnownPrior1} \\ &= \prior{\mu_{\ell} \given \sigma_{\ell}^2} \times \prior{\sigma_{\ell}^2} \tag{2} \label{normalKnownPrior2}\\ &= \prior{\mu_{\ell} \given \sigma_{\ell}^2} \tag{3} \label{normalKnownPrior3} \\ &= \circleEquation{ \normalAbbreviation{\mu_{\ell} }{\mu_0 }{\color{group1} \frac{\sigma_{\ell}^2}{\kappa_0} } } \tag{4} \label{normalKnownPrior4} \\ &= \normalAbbreviation{\mu_{\ell} }{\mu_0 }{\color{group1} \sigma_0^2 }. \tag{5} \label{normalKnownPrior5} \\ \end{align}
Equation 1: Plug in $\mu_{\ell}, \ \sigma_{\ell}^2$ for $\Theta$.

Recall that I defined it the following way in Step 1 of the Approach.

$$\prior{\color{group1} \Theta} = \prior{\color{group1} \mu_{\ell}, \ \sigma_{\color{group1} \ell}^2}$$
Equation 2: Rewrite the joint distribution as a product of the conditional and corresponding marginal.
$$\prior{ {\color{group1} \mu_{\ell}}, \ {\color{group2} \sigma_{\ell}^2} } = \prior{ {\color{group1}\mu_{\ell}} \given {\color{group2} \sigma_{\ell}^2} } \times \prior{\color{group2} \sigma_{\ell}^2}$$
Equation 3: $\sigma_{\ell}^2$ is given, meaning we know it with 100% certainty, so $\prior{\sigma_{\ell}^2} = 1$.
\begin{align} \prior{\mu_{\ell} \given \sigma_{\ell}^2} {\color{group1} \times \prior{\sigma_{\ell}^2} } &= \prior{ \mu_{\ell} \given {\color{group1} \sigma_{\ell}^2} } {\color{group1} \times 1} \\ &= \prior{\mu_{\ell} \given \sigma_{\ell}^2}. \end{align}
Equation 4: Magic!

What we want to do now is specify a distribution of our choice on $\mu_{\ell}$. However, notice that $\sigma_{\ell}^2$ appears as a parameter in the prior. This is weird because we typically want flexibility in specifying a prior variance $\sigma_0^2$. For example, may want to use $\sigma_0^2 > \sigma_{\ell}^2$ because we are very uncertain what the likelihood mean $\mu_{\ell}$ is.

One trick to recover the flexibility in choosing the prior variance $\sigma_0^2$ is to let ${\color{group2} \sigma_0^2 = \frac{\sigma_{\ell}^2}{\kappa_0}}$ :

$$\prior{\mu_{\ell} \given \sigma_{\ell}^2} = \normalAbbreviation{\mu_{\ell} }{\mu_0 }{\color{group2} \frac{\sigma_{\ell}^2}{\kappa_0} }.$$

Now, the prior depends on $\sigma_{\ell}^2$, since we defined it that way. However, it is also as "flexible" as if it does not: if we want a large variance in the prior, we pick a small $\kappa_0$ and if we want a small variance, we pick a large $\kappa_0$ That is, for practical purposes, we can still choose any $\sigma_0^2 \in [0, \infty)$ by plugging in the known variance $\sigma_{\ell}^2$, the chosen prior covariance $\sigma_0^2$, and solving for the appropriate $\kappa_0$ using the formula $\sigma_0^2 = \frac{\sigma_{\ell}^2}{\kappa_0}$.

Equation 5: The likelihood mean is assumed to be sampled from a Gaussian prior with parameters $\mu_0$ and $\sigma_0^2$.

To make derivations clearer in the following sections, we use ${\color{group2} \sigma_0^2}$ instead of $\frac{\sigma_{\ell}^2}{\kappa_0}$ from the formula
$\sigma_0^2 = \frac{\sigma_{\ell}^2}{\kappa_0}$:

$$\normalAbbreviation{\mu_{\ell} }{\mu_0 }{\color{group2} \frac{\sigma_{\ell}^2}{\kappa_0} } = \normalAbbreviation{\mu_{\ell}}{\mu_0}{\color{group2} \sigma_0^2}.$$
Marginal Likelihood of the Training Data
\begin{align} \marginalLikelihood{x_1, x_2, \dots, x_n} &= \int_{-\infty}^{+\infty} \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] \times \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} d\mu_{\ell} \tag{1} \label{normalKnownMarginalLikelihood1} \\ &= \int_{-\infty}^{+\infty} C_1 \times \exp{\left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n (x_i - \mu_{\ell})^2 - \frac{1}{2 \sigma_0^2} (\mu_{\ell} - \mu_0)^2 \right\}} d\mu_{\ell} \tag{2} \label{normalKnownMarginalLikelihood2} \\ &= \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\color{group6} \mu_{\ell}^2}{2} {\color{group1} \left( \frac{n}{\sigma_{\ell}^2} + \frac{1}{\sigma_0^2} \right) } + {\color{group6} \mu_{\ell}} \left( \frac{n \bar{x}}{\sigma_{\ell}^2} + \frac{\mu_0}{\sigma_0^2} \right) \right\} } \times C_2 d\mu_{\ell} \tag{3} \label{normalKnownMarginalLikelihood3} \\ &= \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\color{group6} \mu_{\ell}^2}{2 \color{group1} \sigma_n^2} + \frac{\color{group6} \mu_{\ell}}{\color{group1} \sigma_n^2} \left( {\color{group2} \frac{n \bar{x} \sigma_0^2 + \mu_0 \sigma_{\ell}^2 }{n \sigma_0^2 + \sigma_{\ell}^2 } } \right) \right\} } \times C_2 d\mu_{\ell} \tag{4} \label{normalKnownMarginalLikelihood4} \\ &= \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{1}{2 {\color{group1} \sigma_n^2}} \left[ {\color{group6} \mu_{\ell}^2} - 2 {\color{group6} \mu_{\ell}} {\color{group2} \mu_n} + {\color{group2} \mu_n}^2 \right] \right\} } \times C_3 d\mu_{\ell} \tag{5} \label{normalKnownMarginalLikelihood5} \\ &= C_1 \times C_3 \times \sqrt{2 \pi \sigma_n^2} \tag{6} \label{normalKnownMarginalLikelihood6} \\ &= \circleEquation{ \sqrt{ \frac{\sigma_{\ell}^2 }{\sigma_n^2 (2 \pi \sigma_{\ell}^2)^n } } \times \exp{ \left\{ \frac{ \left( \frac{n \bar{x} \sigma_0}{\sigma_{\ell}} + \frac{\mu_0 \sigma_{\ell}}{\sigma_0} \right)^2 }{2 \sigma_n^2 } \right\} } \times \exp{ \left\{ - \frac{\sum\limits_i^n x_i^2 }{2 \sigma_{\ell}^2 } - \frac{\mu_0^2 }{2 \sigma_0^2 } \right\} } } \tag{7} \label{normalKnownMarginalLikelihood7} \\ \end{align}
Equation 1: Plug in the joint likelihood of the training data and the prior.

As I stated in the Step 4 of the approach, the marginal likelihood is defined as follows:

$$\marginalLikelihood{x_1, x_2, \dots, x_n} = \int_{\color{group2} \mathfrak{D}(\Theta)} \likelihood{x_1, x_2, \dots, x_n }{\color{group2} \Theta } \times \prior{\color{group2} \Theta} d{\color{group2} \Theta}.$$

Plug in the joint likelihood obtained in Equation 1 of the section deriving the Likelihood of the Training Data:

$$\int_{\mathfrak{D}(\Theta)} {\color{group5} \likelihood{x_1, x_2, \dots, x_n }{\Theta } } \times \prior{\Theta} d\Theta \\ = \int_{\mathfrak{D}(\Theta)} {\color{group5} \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] } \times \prior{\Theta} d\Theta. \\$$

Plug in the result from Equation 5 of the section deriving the prior.

$$\int_{\mathfrak{D}(\Theta)} \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] \times {\color{group5} \prior{\Theta}} d\Theta \\ = \int_{\mathfrak{D}(\Theta)} \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] \times {\color{group5} \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} } d\Theta \\$$

For the Normal-Known model, the likelihood variance $\sigma_{\ell}^2$ is constant (known with probability 1), meaning there is no prior on the likelihood variance. The only unknown variable is the likelihood mean $\mu_{\ell}$; hence, $\mathfrak{D}(\Theta) = \mathfrak{D}(\mu_{\ell}) = \mathbb{R} \times \sigma_{\ell}^2 = (- \infty, + \infty) \times \sigma_{\ell}^2:$

$$\int_{\color{group5} \mathfrak{D}(\Theta)} \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] \times \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} {\color{group5} d\Theta} \\ = \int_{\color{group5} -\infty}^{\color{group5} +\infty} \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] \times \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} {\color{group5} d\mu_{\ell}}. \\$$
Equation 2: Rewrite the product as a sum of exponents.

Plugging in the result in Equation 2 of the section deriving the Likelihood of the Training Data, we can rewrite the product in the brackets as a sum of exponents:

$$\int_{-\infty}^{+\infty} {\color{group5} \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] } \times \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} {\color{group5} \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n (x_i - \mu_{\ell})^2 \right\} } } \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} d\mu_{\ell}.$$

Plugging in the definition of the Gaussian, we have

$$\int_{-\infty}^{+\infty} \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n (x_i - \mu_{\ell})^2 \right\} } {\color{group5} \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n (x_i - \mu_{\ell})^2 \right\} } \times {\color{group5} \univariateGaussianEQ{\mu_{\ell}}{\mu_0}{\sigma_0^2} } d\mu_{\ell}.$$

Let $C_1 = \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \times \frac{1}{\sqrt{2 \pi \sigma_0^2}}:$

$$\int_{-\infty}^{+\infty} {\color{group5} \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n } \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n (x_i - \mu_{\ell})^2 \right\} } \times {\color{group5} \frac{1}{\sqrt{2 \pi \sigma_0^2}} } \exp{ \left\{ \frac{-1}{2 \sigma_0^2} (\mu_{\ell} - \mu_0)^2 \right\} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} {\color{group5} C_1 \times} \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n (x_i - \mu_{\ell})^2 \right\} } \times \exp{ \left\{ \frac{-1}{2 \sigma_0^2} (\mu_{\ell} - \mu_0)^2 \right\} } d\mu_{\ell}.$$

Rewriting the product of the two exponents as a sum of exponents, we have

$$\int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n (x_i - \mu_{\ell})^2 \right\} } {\color{group5} \times} \exp{ \left\{ \frac{-1}{2 \sigma_0^2} (\mu_{\ell} - \mu_0)^2 \right\} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n (x_i - \mu_{\ell})^2 {\color{group5} +} \frac{1}{- 2 \sigma_0^2} (\mu_{\ell} - \mu_0)^2 \right\} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ \frac{1}{- 2 \sigma_{\ell}^2} \sum\limits_i^n (x_i - \mu_{\ell})^2 {\color{group5} -} \frac{1}{2 \sigma_0^2} (\mu_{\ell} - \mu_0)^2 \right\} } d\mu_{\ell}.$$
Equation 3: Factor out $\mu_{\ell}$, the only variable.

Expand both of the quadratic expressions in the exponent:

$$\int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n {\color{group5} (x_i - \mu_{\ell})^2} - \frac{1}{2 \sigma_0^2} {\color{group5} (\mu_{\ell} - \mu_0)^2} \right\} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \sum\limits_i^n {\color{group5} [x_i^2 - 2 x_i \mu_{\ell} + \mu_{\ell}^2]} - \frac{1}{2 \sigma_0^2} {\color{group5} [\mu_{\ell}^2 - 2 \mu_{\ell} \mu_0 + \mu_0^2]} \right\} } d\mu_{\ell}.$$

Distribute the summation:

$$\int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} {\color{group5} \sum\limits_i^n} [x_i^2 - 2 x_i \mu_{\ell} + \mu_{\ell}^2] - \frac{1}{2 \sigma_0^2} [\mu_{\ell}^2 - 2 \mu_{\ell} \mu_0 + \mu_0^2] \right\} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \left[ {\color{group5} \sum\limits_i^n} x_i^2 - {\color{group5} \sum\limits_i^n} 2 x_i \mu_{\ell} + {\color{group5} \sum\limits_i^n} \mu_{\ell}^2 \right] - \frac{1}{2 \sigma_0^2} [\mu_{\ell}^2 - 2 \mu_{\ell} \mu_0 + \mu_0^2] \right\} } d\mu_{\ell}.$$

Since ${\color{group5} \sum_i^n x_i = n\bar{x}}$ and ${\color{group5} \sum_i^n \mu_{\ell}^2 = n\mu_{\ell}^2}$,

$$\int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} \left[ \sum\limits_i^n x_i^2 - {\color{group5} \sum\limits_i^n 2 x_i} \mu_{\ell} + {\color{group5} \sum\limits_i^n} \mu_{\ell}^2 \right] - \frac{1}{2 \sigma_0^2} [\mu_{\ell}^2 - 2 \mu_{\ell} \mu_0 + \mu_0^2] \right\} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ \frac{-1}{\color{group1} 2 \sigma_{\ell}^2} \left[ \sum\limits_i^n x_i^2 - {\color{group5} 2n \bar{x}} \mu_{\ell} + {\color{group5} n} \mu_{\ell}^2 \right] - \frac{1}{\color{group2} 2 \sigma_0^2} [\mu_{\ell}^2 - 2 \mu_{\ell} \mu_0 + \mu_0^2] \right\} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ \frac{\sum\limits_i^n x_i^2}{\color{group1} - 2 \sigma_{\ell}^2} + \frac{\cancel{2} n \bar{x} \mu_{\ell} }{\color{group1} \cancel{2} \sigma_{\ell}^2 } - \frac{n \mu_{\ell}^2}{\color{group1} 2 \sigma_{\ell}^2} - \frac{\mu_{\ell}^2}{\color{group2} 2 \sigma_0^2} + \frac{\cancel{2} \mu_{\ell} \mu_0 }{\color{group2} \cancel{2} \sigma_0^2 } - \frac{\mu_0^2}{\color{group2} 2 \sigma_0^2} \right\} } d\mu_{\ell}.$$

We can begin to complete the square by grouping like terms and factoring:

$$\int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ \frac{\sum\limits_i^n x_i^2}{-2 \sigma_{\ell}^2} + {\color{group5} \frac{n \bar{x} {\color{group6} \mu_{\ell}} }{\sigma_{\ell}^2 } } - {\color{group1} \frac{n {\color{group2} \mu_{\ell}^2} }{2 \sigma_{\ell}^2 } } - {\color{group1} \frac{\color{group2} \mu_{\ell}^2 }{2 \sigma_0^2 } } + {\color{group5} \frac{\mu_0 \color{group6} \mu_{\ell} }{\sigma_0^2 } } - \frac{\mu_0^2}{2 \sigma_0^2} \right\} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - {\color{group2} \mu_{\ell}^2} {\color{group1} \left( \frac{n}{2 \sigma_{\ell}^2} + \frac{1}{2 \sigma_0^2} \right) } + {\color{group6} \mu_{\ell}} {\color{group5} \left( \frac{n \bar{x}}{\sigma_{\ell}^2} + \frac{\mu_0}{\sigma_0^2} \right) } - \frac{\sum\limits_i^n x_i^2}{2 \sigma_{\ell}^2} - \frac{\mu_0^2}{2 \sigma_0^2} \right\} } d\mu_{\ell}.$$

Letting $C_2 = \exp{ \left\{ - \frac{\sum\limits_i^n x_i^2 }{2 \sigma_{\ell}^2 } - \frac{\mu_0^2 }{2 \sigma_0^2 } \right\} }$ and factoring $\frac{-1}{2}$ from the first term,

$$\int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \mu_{\ell}^2 \left( \frac{n}{\color{group5} 2 \sigma_{\ell}^2} + \frac{1}{\color{group5} 2 \sigma_0^2} \right) + \mu_{\ell} \left( \frac{n \bar{x}}{\sigma_{\ell}^2} + \frac{\mu_0}{\sigma_0^2} \right) - \frac{\sum\limits_i^n x_i^2}{2 \sigma_{\ell}^2} - \frac{\mu_0^2}{2 \sigma_0^2} \right\} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\mu_{\ell}^2}{\color{group5} 2} \left( \frac{n}{\color{group5} \sigma_{\ell}^2} + \frac{1}{\color{group5} \sigma_0^2} \right) + \mu_{\ell} \left( \frac{n \bar{x}}{\sigma_{\ell}^2} + \frac{\mu_0}{\sigma_0^2} \right) \right\} } \times {\color{group1} \exp{ \left\{ - \frac{\sum\limits_i^n x_i^2}{2 \sigma_{\ell}^2} - \frac{\mu_0^2}{2 \sigma_0^2} \right\} } } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\mu_{\ell}^2}{2} \left( \frac{n}{\sigma_{\ell}^2} + \frac{1}{\sigma_0^2} \right) + \mu_{\ell} \left( \frac{n \bar{x}}{\sigma_{\ell}^2} + \frac{\mu_0}{\sigma_0^2} \right) \right\} } \times {\color{group1} C_2} d\mu_{\ell}.$$
Equation 4: Factor out $\left(\frac{\ell}{\sigma_n^2} + \frac{1}{\sigma_0^2}\right)$ .

Let $\left( \frac{n}{\sigma_{\ell}^2} + \frac{1}{\sigma_0^2} \right) = \frac{n \sigma_0^2 + \sigma_{\ell}^2}{\sigma_{\ell}^2 \sigma_0^2} = \frac{1}{\sigma_n^2}$ :

$$\int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\mu_{\ell}^2}{2} {\color{group5} \left( \frac{n}{\sigma_{\ell}^2} + \frac{1}{\sigma_0^2} \right) } + \mu_{\ell} \left( \frac{n \bar{x}}{\sigma_{\ell}^2} + \frac{\mu_0}{\sigma_0^2} \right) \right\} } \times C_2 d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\mu_{\ell}^2 }{2 \color{group5} \sigma_n^2 } + \mu_{\ell} \left( \frac{n \bar{x}}{\sigma_{\ell}^2} + \frac{\mu_0}{\sigma_0^2} \right) \right\} } \times C_2 d\mu_{\ell}.$$

Before completing the square, we want to factor out $- \frac{\sigma_n^2}{2 \sigma_0^2 \sigma_{\ell}^2}$ so that the exponential has the form of a Gaussian. However, to do this the factor needs to appear somewhere in the second term, so we introduce it the following way:

$$\int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{ {\color{group2} \mu_{\ell}^2} }{2 \color{group5} \sigma_n^2 } + {\color{group2} \mu_{\ell}} \left( \frac{n \bar{x}}{\color{group1} \sigma_{\ell}^2} + \frac{\mu_0}{\color{group6} \sigma_0^2} \right) \right\} } \times C_2 d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\color{group2} \mu_{\ell}^2 }{2 \color{group5} \sigma_n^2 } + {\color{group2} \mu_{\ell}} \left( {\color{group5} \frac{1}{\sigma_n^2}} \right) \left( {\color{group5} \frac{\sigma_n^2}{1}} \right) \left( \frac{n \bar{x}}{\color{group1} \sigma_{\ell}^2} {\color{group6} \frac{\sigma_0^2}{\sigma_0^2}} + \frac{\mu_0}{\color{group6} \sigma_0^2} {\color{group1} \frac{\sigma_{\ell}^2}{\sigma_{\ell}^2}} \right) \right\} } \times C_2 d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\mu_{\ell}^2 }{2 \sigma_n^2 } + \left( \frac{\mu_{\ell}}{\sigma_n^2} \right) \left( \frac{\cancel{\sigma_{\ell}^2 \sigma_0^2} }{n \sigma_0^2 + \sigma_{\ell}^2} \right) \left( \frac{n \bar{x} {\color{group6} \sigma_0^2} + \mu_0 {\color{group1} \sigma_{\ell}^2} }{\cancel{ {\color{group6} \sigma_0^2} {\color{group1} \sigma_{\ell}^2} } } \right) \right\} } \times C_2 d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\mu_{\ell}^2}{2 \sigma_n^2} + \frac{\mu_{\ell}}{\sigma_n^2} \left( \frac{ n \bar{x} \sigma_0^2 + \mu_0 \sigma_{\ell}^2 }{n \sigma_0^2 + \sigma_{\ell}^2 } \right) \right\} } \times C_2 d\mu_{\ell}.$$
Equation 5: Complete the square.

Letting $\mu_n = \frac{n \bar{x} \sigma_0^2 + \mu_0 \sigma_{\ell}^2 }{\sigma_n^2}:$

$$\int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\mu_{\ell}^2}{2 \sigma_n^2} + \frac{\mu_{\ell}}{\sigma_n^2} {\color{group5} \left( \frac{n \bar{x} \sigma_0^2 + \mu_0 \sigma_{\ell}^2 }{n \sigma_0^2 + \sigma_{\ell}^2 } \right) } \right\} } \times C_2 d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\mu_{\ell}^2}{2 \sigma_n^2} + \frac{\mu_{\ell} {\color{group5} \mu_n}}{\sigma_n^2} \right\} } \times C_2 d\mu_{\ell}.$$

Now, we can complete the square by introducing $0 = - \frac{\mu_n^2}{2 \sigma_n^2} + \frac{\mu_n^2}{2 \sigma_n^2}$ into the exponent:

$$= \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\mu_{\ell}^2}{2 \sigma_n^2} + \frac{\mu_{\ell} \mu_n}{\sigma_n^2} + {\color{group5} 0} \right\} } \times C_2 d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\mu_{\ell}^2}{2 \sigma_n^2} + \frac{\mu_{\ell} \mu_n}{\sigma_n^2} {\color{group5} - \frac{\mu_n^2}{2 \sigma_n^2} } {\color{group6} +} {\color{group5} \frac{\mu_n^2}{2 \sigma_n^2} } \right\} } \times C_2 d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\mu_{\ell}^2}{2 \sigma_n^2} + \frac{\mu_{\ell} \mu_n}{\sigma_n^2} - \frac{\mu_n^2}{2 \sigma_n^2} \right\} } {\color{group6} \times} \exp{ \left\{ \frac{\mu_n^2}{2 \sigma_n^2} \right\} } \times C_2 d\mu_{\ell}.$$

Letting $C_3 = \exp{ \left\{ \frac{\mu_n^2}{2 \sigma_n^2} \right\} } \times C_2$ and factoring $\frac{1}{-2 \sigma_n^2}$ from the quadratic expression in the exponent,

$$\int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\mu_{\ell}^2}{2 \sigma_n^2} + \frac{\mu_{\ell} \mu_n}{\sigma_n^2} - \frac{\mu_n^2}{2 \sigma_n^2} \right\} } {\color{group5} \exp{ \left\{ \frac{\mu_n^2}{2 \sigma_n^2} \right\} } \times C_2 } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{\mu_{\ell}^2}{\color{group1} 2 \sigma_n^2} + \frac{-2 \mu_{\ell} \mu_n }{\color{group1} -2 \sigma_n^2 } - \frac{\mu_n^2}{\color{group1} 2 \sigma_n^2} \right\} } \times {\color{group5} C_3} d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ {\color{group1} - \frac{1}{2 \sigma_n^2}} \left[ \mu_{\ell}^2 - 2 \mu_{\ell} \mu_n - \mu_n^2 \right] \right\} } \times C_3 d\mu_{\ell}.$$
Equation 6: Solve the integral.

Since $C_1$ and $C_3$ are constant with respect to $\mu_{\ell}$, we can move them outside of the integral:

$$\int_{-\infty}^{+\infty} {\color{group5} C_1} \times \exp{ \left\{ - \frac{1}{2 \sigma_n^2} \left[ \mu_{\ell}^2 - 2 \mu_{\ell} \mu_n - \mu_n^2 \right] \right\} } \times {\color{group5} C_3} d\mu_{\ell} \\ = {\color{group5} C_1} \times {\color{group5} C_3} \times \int_{-\infty}^{+\infty} \times \exp{ \left\{ - \frac{1}{2 \sigma_n^2} \left[ \mu_{\ell}^2 - 2 \mu_{\ell} \mu_n - \mu_n^2 \right] \right\} } d\mu_{\ell}.$$

Since all probability density functions must integrate to 1, we know $\int \univariateGaussianEQ{x}{\mu}{\sigma^2} d\mu = 1$. Multiplying both sides by the normalizing constant shows us that the integral of the exponential is equal to the normalizing constant:

$$1 = \int_{- \infty}^{+ \infty} \univariateGaussianEQ{x}{\mu}{\sigma^2} d\mu \\ 1 \times {\color{group5} \sqrt{2 \pi \sigma^2}} = \cancel{\color{group5} \sqrt{2 \pi \sigma^2}} \frac{1}{\cancel{\sqrt{2 \pi \sigma^2}}} \int_{- \infty}^{+ \infty} \exp{\left\{ \frac{-1}{2\sigma^2} (x - \mu)^2 \right\}} d\mu \\ \sqrt{2 \pi {\color{group1} \sigma^2}} = \int_{-\infty}^{+ \infty} \exp{ \left\{ \frac{-1}{2 {\color{group1} \sigma^2}} (x - \mu)^2 \right\} } d\mu.$$

We can solve our integral in the same way:

$$C_1 \times C_3 \times \int_{-\infty}^{+\infty} \times \exp{ \left\{ - \frac{1}{\color{group1} 2 \sigma_n^2} {\color{group2} \left[ \mu_{\ell}^2 - 2 \mu_{\ell} \mu_n - \mu_n^2 \right] } \right\} } d\mu_{\ell} \\ = C_1 \times C_3 \times \int_{-\infty}^{+\infty} \times \exp{ \left\{ - \frac{1}{\color{group1} 2 \sigma_n^2} {\color{group2} \left( \mu_{\ell} - \mu_n \right)^2 } \right\} } d\mu_{\ell} \\ = C_1 \times C_3 \times \sqrt{2 \pi {\color{group1} \sigma_n^2}}.$$
Equation 7: Reintroduce the constants $C_1, \sigma_n^2, C_3$.

Plugging in, we can simplify a bit:

$${\color{group1}C_1} \times {\color{group5} C_3} \times {\color{group6} \sqrt{2 \pi \sigma_n^2}} \\ = {\color{group1} \left( \frac{1}{\sqrt{2 \pi \sigma_{\ell}^2}} \right)^n \times \frac{1}{\cancel{\sqrt{2 \pi \sigma_0^2}}} } \times {\color{group5} \exp{ \left\{ \frac{\mu_n^2}{2 \sigma_n^2} \right\} } \times C_2 } \times {\color{group6} \sqrt{\frac{ \cancel{2 \pi} \sigma_{\ell}^2 \cancel{\sigma_0^2} }{n \sigma_0^2 + \sigma_{\ell}^2} } }.$$

Reintroducing $\mu_n$, we have

$$\frac{1}{\left( \sqrt{2 \pi \sigma_{\ell}^2} \right)^n} \times \sqrt{ \frac{\sigma_{\ell}^2 }{n \sigma_0^2 + \sigma_{\ell}^2 } } \times \exp{ \left\{ \frac{1}{2 \sigma_n^2} {\color{group2} \mu_n^2} \right\} } \times C_2 \\ = \frac{1}{\left( \sqrt{2 \pi \sigma_{\ell}^2} \right)^n} \times \sqrt{ \frac{\sigma_{\ell}^2 }{n \sigma_0^2 + \sigma_{\ell}^2 } } \times \exp{ \left\{ \frac{1}{2 {\color{group1} \sigma_n^2}} {\color{group2} \left( \frac{n \bar{x} \sigma_0^2 + \mu_0 \sigma_{\ell}^2 }{n \sigma_0^2 + \sigma_{\ell}^2 } \right)^2 } \right\} } \times C_2 \\ = \frac{1}{\left( \sqrt{2 \pi \sigma_{\ell}^2} \right)^n} \times \sqrt{ \frac{\sigma_{\ell}^2 }{n \sigma_0^2 + \sigma_{\ell}^2 } } \times \exp{ \left\{ \frac{ \cancel{\color{group1} n \sigma_0^2 + \sigma_{\ell}^2} {\color{group2} \left( n \bar{x} \sigma_0^2 + \mu_0 \sigma_{\ell}^2 \right)^2 } }{2 {\color{group1} \sigma_{\ell}^2 \sigma_0^2} \color{group2} \left( n \sigma_0^2 + \sigma_{\ell}^2 \right)^{\cancel{2}} } \right\} } \times C_2 \\ = \frac{1}{\left(\sqrt{2 \pi \sigma_{\ell}^2} \right)^n} \times \sqrt{ \frac{\sigma_{\ell}^2 }{n \sigma_0^2 + \sigma_{\ell}^2 } } \times \exp{ \left\{ \frac{ {\color{group2} (n \bar{x} \sigma_0^2)^2 + 2 (n \bar{x} \sigma_0^2) (\mu_0 \sigma_{\ell}^2) + (\mu_0 \sigma_{\ell}^2)^2 } }{2 {\color{group1} \sigma_{\ell}^2 \sigma_0^2} \color{group2} (n \sigma_0^2 + \sigma_{\ell}^2) } \right\} } \times C_2.$$

Dividing $\sigma_{\ell}^2 \sigma_0^2$ through the the numerator of the exponent, we obtain

$$\frac{1}{\left(\sqrt{2 \pi \sigma_{\ell}^2} \right)^n} \times \sqrt{ \frac{\sigma_{\ell}^2 }{n \sigma_0^2 + \sigma_{\ell}^2 } } \times \exp{ \left\{ \frac{ {\color{group5} n^2 \bar{x}^2 (\sigma_0^2)^{\cancel{2}}} + 2 {\color{group5} (n \bar{x} \cancel{\sigma_0^2})} {\color{group6} (\mu_0 \cancel{\sigma_{\ell}^2})} + {\color{group6} \mu_0^2 (\sigma_{\ell}^2)^{\cancel{2}}} }{2 \cancel{\sigma_{\ell}^2 \sigma_0^2} (n \sigma_0^2 + \sigma_{\ell}^2) } \right\} } \times C_2 \\ = \frac{1}{\left(\sqrt{2 \pi \sigma_{\ell}^2} \right)^n} \times \sqrt{ \frac{\sigma_{\ell}^2 }{n \sigma_0^2 + \sigma_{\ell}^2 } } \times \exp{ \left\{ \frac{ {\color{group5} \frac{n^2 \bar{x}^2 \sigma_0^2}{\sigma_{\ell}^2}} + 2 {\color{group5} (n \bar{x})} {\color{group6} (\mu_0)} + {\color{group6} \frac{\mu_0^2 \sigma_{\ell}^2}{\sigma_0^2}} }{2 (n \sigma_0^2 + \sigma_{\ell}^2) } \right\} } \times C_2 \\ = \frac{1}{\left(\sqrt{2 \pi \sigma_{\ell}^2} \right)^n} \times \sqrt{ \frac{\sigma_{\ell}^2 }{2 (n \sigma_0^2 + \sigma_{\ell}^2) } } \times \exp{ \left\{ \frac{ \left( {\color{group5} \frac{n \bar{x} \sigma_0}{\sigma_{\ell}}} + {\color{group6} \frac{\mu_0 \sigma_{\ell}}{\sigma_0}} \right)^2 }{2 (n \sigma_0^2 + \sigma_{\ell}^2) } \right\} } \times C_2.$$

Finally, reintroducing $C_2$, we obtain

$$= \frac{1}{\left(\sqrt{2 \pi \sigma_{\ell}^2} \right)^n} \times \sqrt{ \frac{\sigma_{\ell}^2 }{n \sigma_0^2 + \sigma_{\ell}^2 } } \times \exp{ \left\{ \frac{ \left( \frac{n \bar{x} \sigma_0}{\sigma_{\ell}} + \frac{\mu_0 \sigma_{\ell}}{\sigma_0} \right)^2 }{2 (n \sigma_0^2 + \sigma_{\ell}^2) } \right\} } \times {\color{group5} C_2} \\ = \frac{1}{\left(\sqrt{2 \pi \sigma_{\ell}^2} \right)^n} \times \sqrt{ \frac{\sigma_{\ell}^2 }{n \sigma_0^2 + \sigma_{\ell}^2 } } \times \exp{ \left\{ \frac{ \left( \frac{n \bar{x} \sigma_0}{\sigma_{\ell}} + \frac{\mu_0 \sigma_{\ell}}{\sigma_0} \right)^2 }{2 (n \sigma_0^2 + \sigma_{\ell}^2) } \right\} } {\color{group1} \times} {\color{group5} \exp{ \left\{ - \frac{\sum\limits_i^n x_i^2 }{2 \sigma_{\ell}^2 } - \frac{\mu_0^2 }{2 \sigma_0^2 } \right\} } } \\ = \sqrt{ \frac{\sigma_{\ell}^2 }{(n \sigma_0^2 + \sigma_{\ell}^2) (2 \pi \sigma_{\ell}^2)^n } } \times \exp{ \left\{ \frac{ \left( \frac{n \bar{x} \sigma_0}{\sigma_{\ell}} + \frac{\mu_0 \sigma_{\ell}}{\sigma_0} \right)^2 }{2 (n \sigma_0^2 + \sigma_{\ell}^2) } {\color{group1} -} \frac{\sum\limits_i^n x_i^2 }{2 \sigma_{\ell}^2 } - \frac{\mu_0^2 }{2 \sigma_0^2 } \right\} }.$$
Posterior on Likelihood Parameters
\begin{align} \posterior{\Theta}{x_1, x_2, \dots, x_n} &= \frac{ \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] \ \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} }{\int_{\mathfrak{D}(\mu_{\ell})} \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] \ \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} d\mu_{\ell} } \tag{1} \label{normalKnownPosterior1} \\ &= \normalAbbreviation{ \mu_{\ell} }{\mu_n }{\color{group1} \sigma_n^2 } \tag{2} \label{normalKnownPosterior2} \\ &= \circleEquation{ \normalAbbreviation{ \mu_{\ell} }{\mu_n }{\color{group1} \frac{\sigma_{\ell}^2 }{n + \kappa_0 } } } \tag{3} \label{normalKnownPosterior3} \\ \end{align}
Equation 1: Plug in using the result in Equation 1 of the Marginal Likelihood derivation.

Recall Step 4 of the approach and plug in the same way we did in Equation 1 of the Marginal Likelihood derivation:

$$\posterior{\Theta}{x_1, x_2, \dots, x_n} = \frac{\likelihood{x_1, x_2, \dots, x_n}{\Theta} \ \prior{\Theta}} {\marginalLikelihood{x_1, x_2, \dots, x_n}} \\ = \frac{ {\color{group1} \likelihood{x_1, x_2, \dots, x_n}{\Theta}} \ {\color{group2} \prior{\Theta}} }{\int_{\mathfrak{D}(\Theta)} {\color{group1} \likelihood{x_1, x_2, \dots, x_n}{\Theta}} \ {\color{group2} \prior{\Theta}} d{\Theta} } \\ = \frac{ {\color{group1} \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] } \ {\color{group2} \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} } }{\int_{\mathfrak{D}(\Theta)} {\color{group1} \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] } \ {\color{group2} \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} } d\Theta }.$$

Again, for the Normal-Known model, the likelihood variance $\sigma_{\ell}^2$ is constant (known with probability 1), meaning there is no prior on the likelihood variance. The only unknown variable is the likelihood mean $\mu_{\ell}$; hence, $\mathfrak{D}(\Theta) = \mathfrak{D}(\mu_{\ell}, \sigma_{\ell}^2) = \mathbb{R} \times \sigma_{\ell}^2 = (- \infty, + \infty) \times \sigma_{\ell}^2:$

$$\frac{ \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] \ \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} }{\int_{\color{group5} \mathfrak{D}(\Theta)} \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] \ \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} {\color{group5} d\Theta} } = \frac{ \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] \ \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} }{\int_{\color{group5} - \infty}^{\color{group5} + \infty} \left[ \prod\limits_i^n \normalAbbreviation{x_i}{\mu_{\ell}}{\sigma_{\ell}^2} \right] \ \normalAbbreviation{\mu_{\ell}}{\mu_0}{\sigma_0^2} {\color{group5} d\mu_{\ell}} }.$$
Equation 2: Recall Equations 5 and 6 of the Marginal Likelihood derivation.

The denominator is exactly what we derived in Equation 5 of the Marginal Likelihood derivation, and the numerator is exactly the same thing without the integration. Plugging in that result, factoring the quadratic expressions, and moving constants to the front, we have the following:

$$\frac{ C_1 \times \exp{ \left\{ - \frac{1}{2 {\color{group1} \sigma_n^2}} \left[ {\color{group6} \mu_{\ell}^2} - 2 {\color{group6} \mu_{\ell}} {\color{group2} \mu_n} + {\color{group2} \mu_n}^2 \right] \right\} } \times C_3 }{\int_{-\infty}^{+\infty} C_1 \times \exp{ \left\{ - \frac{1}{2 {\color{group1} \sigma_n^2}} \left[ {\color{group6} \mu_{\ell}^2} - 2 {\color{group6} \mu_{\ell}} {\color{group2} \mu_n} + {\color{group2} \mu_n}^2 \right] \right\} } \times C_3 d\mu_{\ell} } \\ = \frac{ C_1 \times C_3 \times \exp{ \left\{ - \frac{1}{2 {\color{group1} \sigma_n^2}} \left( {\color{group6} \mu_{\ell}} - {\color{group2} \mu_n} \right)^2 \right\} } }{C_1 \times C_3 \times \int_{-\infty}^{+\infty} \exp{ \left\{ - \frac{1}{2 {\color{group1} \sigma_n^2}} \left( {\color{group6} \mu_{\ell}} - {\color{group2} \mu_n} \right)^2 \right\} } \times d\mu_{\ell} }.$$

Recall that we solved the denominator’s integral in Equation 6 of the marginal likelihood derivation. Plugging that solution in, we obtain

$$\frac{ \cancel{C_1} \times \cancel{C_3} \times \exp{ \left\{ - \frac{1}{2 \sigma_n^2} \left( \mu_{\ell} - \mu_n \right)^2 \right\} } }{\cancel{C_1} \times \cancel{C_3} \times {\color{group5} \int_{-\infty}^{+\infty} \exp{ \left\{ - \frac{1}{2 \sigma_n^2} \left( \mu_{\ell} - \mu_n \right)^2 \right\} } } \times d\mu_{\ell} } \\ = \frac{ \exp{ \left\{ - \frac{1}{2 \sigma_n^2} \left( \mu_{\ell} - \mu_n \right)^2 \right\} } }{\color{group5} \sqrt{2 \pi \sigma_n^2} } \\ = \normalAbbreviation{\mu_{\ell}}{\mu_n}{\sigma_n^2}.$$
Equation 3: Simplify the covariance to show the posterior has the same form as the prior.

Recalling that we defined the variance of the prior as $\sigma_0^2 = \frac{\sigma_{\ell}^2}{\kappa_0}$. Plugging in, we can simplify as follows:

\begin{align} \sigma_n^2 &= \frac{ \sigma_{\ell}^2 {\color{group1} \sigma_0^2} }{n {\color{group1} \sigma_0^2} + \sigma_{\ell}^2 } = \frac{ \sigma_{\ell}^2 \times {\color{group1} \frac{\sigma_{\ell}^2}{\kappa_0}} }{\sigma_{\ell}^2 + n \color{group1} \frac{\sigma_{\ell}^2}{\kappa_0} } \\ &= \frac{ \frac{ \sigma_{\ell}^2 {\color{group1} \sigma_{\ell}^2} }{\cancel{\color{group1} \kappa_0} } }{\frac{ n {\color{group1} \sigma_{\ell}^2} + {\color{group1} \kappa_0} \sigma_{\ell}^2 }{\cancel{\color{group1} \kappa_0} } } = \frac{\sigma_{\ell}^2 \sigma_{\ell}^2 }{n \sigma_{\ell}^2 + \kappa_0 \sigma_{\ell}^2 } \\ &= \frac{\sigma_{\ell}^2 \cancel{\sigma_{\ell}^2} }{n \cancel{\sigma_{\ell}^2} + \kappa_0 \cancel{\sigma_{\ell}^2} } = \frac{\sigma_{\ell}^2 }{n + \kappa_0 }. \end{align}
Posterior Predictive of Test Data
\begin{align} \posteriorPredictive{x_*}{x_1, x_2, \dots, x_n} &= \int_{-\infty}^{+\infty} \normalAbbreviation{x_*}{\mu_{\ell}}{\color{group1} \sigma_{\ell}^2} \normalAbbreviation{\mu_{\ell} }{\color{group2} \mu_n }{\color{group1} \sigma_n^2 } d\mu_{\ell} \tag{1} \label{normalKnownPosteriorPredictive1}\\ &= \int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ \sigma_n^2 \left( x_*^2 - 2 \mu_{\ell} x_* + \mu_{\ell}^2 \right) + \sigma_{\ell}^2 \left( \mu_{\ell}^2 - 2 \mu_{\ell} \mu_n + \mu_n^2 \right) }{-2 \sigma_{\ell}^2 \sigma_n^2 } \right\} } d\mu_{\ell} \tag{2} \label{normalKnownPosteriorPredictive2}\\ &= \int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ \sigma_*^2 \left[ \mu_{\ell}^2 - \frac{ -2 \mu_{\ell} \left( \sigma_{\ell}^2 x_* + \sigma_{\ell}^2 \mu_n \right) }{\sigma_*^2 } - \frac{ \sigma_{\ell}^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 }{\sigma_*^2 } \right] }{-2 \sigma_{\ell}^2 \sigma_n^2 } \right\} } d\mu_{\ell} \tag{3} \label{normalKnownPosteriorPredictive3}\\ &= \int_{-\infty}^{+\infty} C_4 \times C_5 \times \exp{ \left\{ \frac{ - \sigma_*^2 }{2 \sigma_{\ell}^2 \sigma_n^2 } \left[ \mu_{\ell} - \frac{ \left( \sigma_n^2 x + \sigma_{\ell}^2 \mu_n \right) }{\sigma_*^2 } \right]^2 \right\} } d\mu_{\ell} \tag{4} \label{normalKnownPosteriorPredictive4}\\ &= C_5 \times \frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \tag{5} \label{normalKnownPosteriorPredictive5}\\ &= \frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \times \exp{ \left\{ \frac{ 1 }{- 2 \sigma_{\ell}^2 \sigma_n^2 {\color{group1} \sigma_*^2} } \left( \sigma_*^2 \left[ \sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 \right] - \left[ \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right]^2 \right) \right\} } \tag{6} \label{normalKnownPosteriorPredictive6}\\ &= \circleEquation{ \normalAbbreviation{x_* }{\color{group2} \mu_n }{\color{group1} \sigma_{\ell}^2 + \sigma_n^2 } } \tag{7} \label{normalKnownPosteriorPredictive7}\\ \end{align}
Equation 1: Plug into the definition of posterior predictive.

As I stated in the approach, the posterior predictive is equal to

$$\int_{\color{group5} \mathfrak{D}(\Theta)} {\color{group1} \likelihood{x_*}{\Theta}} \ {\color{group2} \posterior{\Theta}{x_1, x_2, \dots, x_n}} d{\color{group5} \Theta}.$$

From the defintion of the model, we know the likelihood is $\normalAbbreviation{x_*}{\mu_{\ell}}{\sigma_{\ell}^2}$, and in the earlier section, we derived the posterior, $\normalAbbreviation{\mu_{\ell}}{\mu_n}{\sigma_n^2}.$ So, plugging in we have

$$\int_{\color{group5} - \infty}^{\color{group5} + \infty} {\color{group1} \normalAbbreviation{x_*}{\mu_{\ell}}{\sigma_{\ell}^2} } {\color{group2} \normalAbbreviation{\mu_{\ell}}{\mu_n}{\sigma_n^2} } d{\color{group5} \mu_{\ell}}.$$

Again, we integrate on the interval $(-\infty, +\infty)$ because $\mu_{\ell}$ is a real number such that $\mu_{\ell} \in (-\infty, +\infty)$.

Equation 2: Substitute for the normalizing constant and expand the exponent.

Plugging in the Gaussian expressions, moving the normalizing constants to the front, and simplifying the exponents, we start with

$$\int_{-\infty}^{+\infty} {\color{group5} \normalAbbreviation{x_*}{\mu_{\ell}}{\sigma_{\ell}^2}} {\color{group6} \normalAbbreviation{\mu_{\ell}}{\mu_n}{\sigma_n^2}} d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} {\color{group5} \frac{1}{\sqrt{(2 \pi \sigma_{\ell}^2)}}} \times {\color{group6} \frac{1}{\sqrt{(2 \pi \sigma_n^2)}}} \times {\color{group5} \exp{ \left\{ \frac{-1}{2 \sigma_{\ell}^2} (x_* - \mu_{\ell})^2 \right\} } } {\color{group6} \exp{ \left\{ \frac{-1}{2 \sigma_n^2} (\mu_{\ell} - \mu_n)^2 \right\} } } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} \frac{1}{\sqrt{ {\color{group5} (2\pi\sigma_{\ell}^2)} {\color{group6} (2\pi\sigma_n^2)}} } \times \exp{ \left\{ {\color{group5} \frac{-1}{2 \sigma_{\ell}^2} (x_* - \mu_{\ell})^2 } - {\color{group6} \frac{1}{2 \sigma_n^2} (\mu_{\ell} - \mu_n)^2 } \right\} } d\mu_{\ell}.$$

Letting ${\color{group1} C_4 = \frac{1}{\sqrt{(2\pi\sigma_{\ell}^2)(2\pi\sigma_n^2)}} }$ , and expanding both ${\color{group5} (x_* - \mu_{\ell})^2}$ and ${\color{group6} (\mu_{\ell} - \mu_n)^2}$,

$$\int_{-\infty}^{+\infty} {\color{group1} C_4} \times \exp{ \left\{ {\color{group5} \frac{-1}{2 \sigma_{\ell}^2} (x_*^2 - 2x_*\mu_{\ell} + \mu_{\ell}^2) } - {\color{group6} \frac{1}{2 \sigma_n^2} (\mu_{\ell}^2 - 2\mu_{\ell}\mu_n + \mu_n^2) } \right\} } d\mu_{\ell}.$$

Obtain a common denominator by introducing ${\color{group6} 1 = \frac{\sigma_n^2}{\sigma_n^2}}$ and ${\color{group5} 1 = \frac{\sigma_{\ell}^2}{\sigma_{\ell}^2}}$ to the first and second terms, respectively:

$$\int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ {\color{group5} \frac{-1}{2 \sigma_{\ell}^2}} \left( {\color{group6} \frac{\sigma_n^2}{\sigma_n^2}} \right) {\color{group5} (x_*^2 - 2x_*\mu_{\ell} + \mu_{\ell}^2)} - {\color{group6} \frac{1}{2 \sigma_n^2}} \left( {\color{group5} \frac{\sigma_{\ell}^2}{\sigma_{\ell}^2}} \right) {\color{group6} (\mu_{\ell}^2 - 2\mu_{\ell}\mu_n + \mu_n^2)} \right\} } d\mu \\ = \int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{- \color{group6} \sigma_n^2 }{ {\color{group5} 2 \sigma_{\ell}^2} {\color{group6} \sigma_n^2} } {\color{group5} (x_*^2 - 2x_*\mu_{\ell} + \mu_{\ell}^2)} - \frac{\color{group5} \sigma_{\ell}^2 }{ {\color{group6} 2 \sigma_n^2} {\color{group5} \sigma_{\ell}^2} } {\color{group6} (\mu_{\ell}^2 - 2\mu_{\ell}\mu_n + \mu_n^2)} \right\} } d\mu$$

Factoring ${\color{group2} \frac{1}{-2 \sigma_{\ell}^2 \sigma_n^2}}$ ,

$$\int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{- \sigma_n^2 }{\color{group2} 2 \sigma_{\ell}^2 \sigma_n^2 } (x_*^2 - 2x_*\mu_{\ell} + \mu_{\ell}^2) - \frac{\sigma_{\ell}^2 }{\color{group2} 2 \sigma_n^2 \sigma_{\ell}^2 } (\mu_{\ell}^2 - 2\mu_{\ell}\mu_n + \mu_n^2) \right\} } d\mu \\ = \int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ 1 }{\color{group2} - 2 \sigma_{\ell}^2 \sigma_n^2 } \left[ \sigma_n^2 \left( x_*^2 - 2 \mu x_* + \mu^2 \right) + \sigma_{\ell}^2 \left( \mu^2 - 2 \mu \mu_n + \mu_n^2 \right) \right] \right\} } d\mu \\ = \int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ \sigma_n^2 \left( x_*^2 - 2 \mu x_* + \mu^2 \right) + \sigma_{\ell}^2 \left( \mu^2 - 2 \mu \mu_n + \mu_n^2 \right) }{\color{group2} - 2 \sigma_{\ell}^2 \sigma_n^2 } \right\} } d\mu.$$
Equation 3: Factor out $\sigma_*^2 = \sigma_{\ell}^2 + \sigma_n^2$.

Distributing the posterior and likelihood variances,

$$\int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ \sigma_n^2 \left( x_*^2 {\color{group2} - 2 \mu_{\ell}} x_* + {\color{group1} \mu_{\ell}^2} \right) + \sigma_{\ell}^2 \left( \mu_{\ell}^2 {\color{group2} - 2 \mu_{\ell}} \mu_n + {\color{group1} \mu_n^2} \right) }{-2 \sigma_{\ell}^2 \sigma_n^2 } \right\} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ x_*^2 \sigma_n^2 - {\color{group2} 2 \mu_{\ell}} x_* \sigma_n^2 + {\color{group1} \mu_{\ell}^2} \sigma_n^2 + {\color{group1} \mu_{\ell}^2} \sigma_{\ell}^2 - {\color{group2} 2 \mu_{\ell}} \mu_n \sigma_{\ell}^2 + \mu_n^2 \sigma_{\ell}^2 }{-2 \sigma_{\ell}^2 \sigma_n^2 } \right\} } d\mu_{\ell}.$$

Grouping like terms in the exponent,

$$\int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ {\color{group1} \mu_{\ell}^2} \sigma_n^2 + {\color{group1} \mu_{\ell}^2} \sigma_{\ell}^2 - {\color{group2} 2 \mu_{\ell}} x_* \sigma_n^2 - {\color{group2} 2 \mu_{\ell}} \mu_n \sigma_{\ell}^2 + \mu_n^2 \sigma_{\ell}^2 + x_*^2 \sigma_n^2 }{-2 \sigma_{\ell}^2 \sigma_n^2 } \right\} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ {\color{group1} \mu_{\ell}^2} (\sigma_n^2 + \sigma_{\ell}^2) - {\color{group2} 2 \mu_{\ell}} (x_* \sigma_n^2 + \mu_n \sigma_{\ell}^2) + \mu_n^2 \sigma_{\ell}^2 + x_*^2 \sigma_n^2 }{-2 \sigma_{\ell}^2 \sigma_n^2 } \right\} } d\mu_{\ell}.$$

Letting ${\color{group6} \sigma_*^2 = \sigma_{\ell}^2 + \sigma_n^2}$ , we have

$$\int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ {\color{group1} \mu_{\ell}^2} {\color{group6} \sigma_*^2} - {\color{group2} 2 \mu_{\ell}} (x_* \sigma_n^2 + \mu_n \sigma_{\ell}^2) + \mu_n^2 \sigma_{\ell}^2 + x_*^2 \sigma_n^2 }{-2 \sigma_{\ell}^2 \sigma_n^2 } \right\} } d\mu_{\ell}.$$

Factor $\sigma_*^2$ from the numberator’s exponent to obtain

$$\int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ {\color{group6} \sigma_*^2} \left[ {\color{group1} \mu_{\ell}^2} - \frac{2 {\color{group2} \mu_{\ell}} \left( \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right) }{\color{group6} \sigma_*^2 } + \frac{\sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 }{\color{group6} \sigma_*^2 } \right] }{-2 \sigma_{\ell}^2 \sigma_n^2 } \right\} } d\mu_{\ell}.$$
Equation 4: Complete the square.

To complete the square, we introduce ${\color{group5} 0 = \left( \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } \right)^2 - \left( \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } \right)^2 }$ to the numerator in the exponent, and let $C_5 = \exp{ \left\{ {\color{group6} \frac{\sigma_*^2}{- 2 \sigma_{\ell}^2 \sigma_n^2} } \left[ {\color{group5} - \left( \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } \right)^2 } + \frac{\sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 }{\sigma_*^2 } \right] \right\} }$ , which, of course, is constant with respect to $\mu_{\ell}$.

$$\int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ {\color{group6} \sigma_*^2} \left[ \mu_{\ell}^2 - \frac{ 2 \mu_{\ell} \left( \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right) }{\sigma_*^2 } {\color{group5} + 0} + \frac{\sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 }{\sigma_*^2 } \right] }{\color{group6} - 2 \sigma_{\ell}^2 \sigma_n^2 } \right\} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ {\color{group6} \sigma_*^2} \left[ \mu_{\ell}^2 - \frac{ 2 \mu_{\ell} \left( \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right) }{\sigma_*^2 } {\color{group5} + \left( \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } \right)^2 - \left( \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } \right)^2 } + \frac{\sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 }{\sigma_*^2 } \right] }{\color{group6} - 2 \sigma_{\ell}^2 \sigma_n^2 } \right\} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ {\color{group6} \sigma_*^2} \left[ \mu_{\ell}^2 - \frac{ 2 \mu_{\ell} \left( \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right) }{\sigma_*^2 } {\color{group5} + \left( \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } \right)^2 } \right] }{\color{group6} - 2 \sigma_{\ell}^2 \sigma_n^2 } {\color{group1} +} \frac{ {\color{group6} \sigma_*^2} \left[ {\color{group5} - \left( \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } \right)^2 } + \frac{\sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 }{\sigma_*^2 } \right] }{\color{group6} - 2 \sigma_{\ell}^2 \sigma_n^2 } \right\} } d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ {\color{group6} \sigma_*^2} \left[ \mu_{\ell}^2 - \frac{ 2 \mu_{\ell} \left( \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right) }{\sigma_*^2 } {\color{group5} + \left( \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } \right)^2 } \right] }{\color{group6} - 2 \sigma_{\ell}^2 \sigma_n^2 } \right\} } {\color{group1} \times} C_5 d\mu_{\ell}.$$

Factoring the quadratic expression in the exponent, we obtain the following

$$\int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ {\color{group6} \sigma_*^2} \left[ {\color{group2} \mu_{\ell}}^2 - \frac{ 2 {\color{group2} \mu_{\ell}} {\color{group1} \left( \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right) } }{\color{group1} \sigma_*^2 } + \left( {\color{group1} \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } } \right)^2 \right] }{\color{group6} - 2 \sigma_{\ell}^2 \sigma_n^2 } \right\} } \times C_5 d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} C_4 \times \exp{ \left\{ \frac{ {\color{group6} \sigma_*^2} }{\color{group6} - 2 \sigma_{\ell}^2 \sigma_n^2 } \left[ {\color{group2} \mu_{\ell}} - \left( {\color{group1} \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } } \right) \right]^2 \right\} } \times C_5 d\mu_{\ell}.$$
Equation 5: Rewrite the normalizing constant so the integral evaluates to 1.

Introduce ${\color{group6} \sqrt{\frac{\sigma_*^2}{\sigma_*^2}}}$ into $C_4$, so that we have the corresponding normalizing constant for the gaussian expression being integrated. Recall that ${\color{group5} C_4 = \frac{1}{\sqrt{(2\pi\sigma_{\ell}^2)(2\pi\sigma_n^2)}} }$ :

$$\int_{-\infty}^{+\infty} {\color{group5} C_4} \times \exp{ \left\{ \frac{ \sigma_*^2 }{- 2 \sigma_{\ell}^2 \sigma_n^2 } \left[ \mu_{\ell} - \left( \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } \right) \right]^2 \right\} } \times C_5 d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} {\color{group5} \frac{1}{\sqrt{(2\pi\sigma_{\ell}^2)(2\pi\sigma_n^2)}} } \times \exp{ \left\{ \frac{ \sigma_*^2 }{- 2 \sigma_{\ell}^2 \sigma_n^2 } \left[ \mu_{\ell} - \left( \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } \right) \right]^2 \right\} } \times C_5 d\mu_{\ell} \\ = \int_{-\infty}^{+\infty} {\color{group6} \sqrt{\frac{\sigma_*^2}{\sigma_*^2}} } \times {\color{group5} \frac{1}{\sqrt{(2\pi\sigma_{\ell}^2)(2\pi\sigma_n^2)}} } \times \exp{ \left\{ \frac{ {\color{group6} \sigma_*^2} }{- 2 {\color{group5} \sigma_{\ell}^2 \sigma_n^2} } \left[ \mu_{\ell} - \left( \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } \right) \right]^2 \right\} } \times C_5 d\mu_{\ell}$$

Moving the appropriate constants outside of the integral, we can make the integral evaluate to 1:

$$\int_{-\infty}^{+\infty} \sqrt{\frac{\color{group1} \sigma_*^2}{\color{group2} \sigma_*^2}} \times \frac{1}{\sqrt{ ({\color{group2} 2 \pi} {\color{group1} \sigma_{\ell}^2)} ({\color{group1} 2 \pi \sigma_n^2})} } \times \exp{ \left\{ {\color{group1} \frac{ \sigma_*^2 }{- 2 \sigma_{\ell}^2 \sigma_n^2 } } \left[ \mu_{\ell} - \left( \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } \right) \right]^2 \right\} } \times {\color{group2} C_5} d\mu_{\ell} \\ = {\color{group2} C_5 \times \frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } } \int_{-\infty}^{+\infty} {\color{group1} \sqrt{ \frac{ \sigma_*^2 }{2 \pi \sigma_0^2 \sigma_n^2 } } } \exp{ \left\{ {\color{group1} \frac{ - \sigma_*^2 }{2 \sigma_0^2 \sigma_n^2 } } \left[ \mu_{\ell} - \frac{ \left( \sigma_n^2 x + \sigma_0^2 \mu_n \right) }{\sigma_*^2 } \right]^2 \right\} } d\mu_{\ell} \\ = {\color{group2} C_5 \times \frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } } \cancel{ \int_{-\infty}^{+\infty} \normalAbbreviation{ \mu_{\ell} }{\frac{ \left( \sigma_n^2 x + \sigma_0^2 \mu_n \right) }{\sigma_*^2 } }{\color{group1} \frac{ \sigma_0^2 \sigma_n^2 }{\sigma_*^2 } } d\mu_{\ell} } \\ = {\color{group2} C_5 \times \frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } } \times 1.$$

Recall that the integral of a Gaussian evaluates to 1 when integrating along the entire real line.

Equation 6: Reintroduce the constant $C_5$.

Recall that $C_5 = \exp{ \left\{ \frac{\sigma_*^2}{- 2 \sigma_{\ell}^2 \sigma_n^2} \left[ - \left( \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } \right)^2 + \frac{\sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 }{\sigma_*^2 } \right] \right\} }$ . Therefore,

$${\color{group5} C_5} \times \frac{1}{\sqrt{2 \pi \sigma_*^2}} = \frac{1}{\sqrt{2 \pi \sigma_*^2}} {\color{group5} \exp{ \left\{ \frac{\sigma_*^2}{2 \sigma_{\ell}^2 \sigma_n^2} \left[ - \left( \frac{(\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n) }{\sigma_*^2 } \right)^2 + \frac{\sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 }{\sigma_*^2 } \right] \right\} } }.$$

Rearranging the two terms in the exponent and simplifying:

$$\frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \times \exp{ \left\{ \frac{ \sigma_*^2 }{- 2 \sigma_{\ell}^2 \sigma_n^2 } \left( \left[ {\color{group1} \frac{\sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 }{\sigma_*^2 } } \right] - \left[ {\color{group2} \frac{\sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n }{\sigma_*^2 } } \right]^2 \right) \right\} } \\ = \frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \times \exp{ \left\{ \frac{ \cancel{\sigma_*^2} }{- 2 \sigma_{\ell}^2 \sigma_n^2 } \left( \frac{ 1 }{\color{group1} \cancel{\sigma_*^2} } {\color{group1} \left[ \sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 \right] } - \frac{ 1 }{\color{group2} (\sigma_*^2)^{\cancel{2}} } {\color{group2} \left[ \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right]^2 } \right) \right\} } \\ = \frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \times \exp{ \left\{ \frac{1}{- 2 \sigma_{\ell}^2 \sigma_n^2} \times \left( \left[ \sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 \right] - \frac{ 1 }{\sigma_*^2 } \left[ \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right]^2 \right) \right\} }.$$

Introduce ${\color{group6} \frac{\sigma_*^2}{\sigma_*^2}}$ the following way:

$$\frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \times \exp{ \left\{ \frac{1}{- 2 \sigma_{\ell}^2 \sigma_n^2} \times {\color{group6} 1} \times \left( \left[ \sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 \right] - \frac{ 1 }{\sigma_*^2 } \left[ \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right]^2 \right) \right\} } \\ = \frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \times \exp{ \left\{ \frac{1}{- 2 \sigma_{\ell}^2 \sigma_n^2} \times {\color{group6} \frac{\sigma_*^2}{\sigma_*^2}} \times \left( \left[ \sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 \right] - \frac{ 1 }{\sigma_*^2 } \left[ \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right]^2 \right) \right\} } \\ = \frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \times \exp{ \left\{ \frac{1}{- 2 \sigma_{\ell}^2 \sigma_n^2 {\color{group6} \sigma_*^2}} \left( {\color{group6} \sigma_*^2} \left[ \sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 \right] - \frac{ \cancel{\color{group6} \sigma_*^2} }{\cancel{\color{group6} \sigma_*^2} } \left[ \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right]^2 \right) \right\} } \\ = \frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \times \exp{ \left\{ \frac{1}{- 2 \sigma_{\ell}^2 \sigma_n^2 \sigma_*^2} \left( \sigma_*^2 \left[ \sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 \right] - \left[ \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right]^2 \right) \right\} } \\$$
Equation 7: Expand, and then simplify.

Since $\sigma_*^2 = \sigma_n^2 + \sigma_{\ell}^2$,

\begin{align} \frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \times \exp{ \left\{ \frac{ 1 }{- 2 \sigma_{\ell}^2 \sigma_n^2 \sigma_*^2 } \left( {\color{group2} \sigma_*^2} {\color{group5} \left[ \sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 \right] } {\color{group6} - \left[ \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right]^2 } \right) \right\} } &=\frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \times \exp{ \left\{ \frac{ 1 }{- 2 \sigma_{\ell}^2 \sigma_n^2 \sigma_*^2 } \left( {\color{group2} (\sigma_{\ell}^2 + \sigma_n^2)} {\color{group5} \left[ \sigma_n^2 x_*^2 + \sigma_{\ell}^2 \mu_n^2 \right] } {\color{group6} - \left[ \sigma_n^2 x_* + \sigma_{\ell}^2 \mu_n \right]^2 } \right) \right\} } \\ &=\frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \times \exp{ \left\{ \frac{-1}{2 \sigma_*^2} \frac{ \left[ \left( {\color{group2} \sigma_{\ell}^2} {\color{group5} \sigma_n^2 x_*^2} + {\color{group2} \sigma_{\ell}^2} {\color{group5} \sigma_{\ell}^2 \mu_n^2} \right) \right] }{\sigma_{\ell}^2 \sigma_n^2 } \right\} } \times \exp{ \left\{ \frac{-1}{2 \sigma_*^2} \frac{ \left[ \left( {\color{group2} \sigma_n^2} {\color{group5} \sigma_n^2 x_*^2} + {\color{group2} \sigma_n^2} {\color{group5} \sigma_{\ell}^2 \mu_n^2} \right) - {\color{group6} (\sigma_n^2 x_*)^2} - 2 {\color{group6} (\sigma_n^2 x_*) (\sigma_{\ell}^2 \mu_n) } - {\color{group6} (\sigma_{\ell}^2 \mu_n)^2} \right] }{\sigma_{\ell}^2 \sigma_n^2 } \right\} }. \end{align}

We can simplify further by breaking the fractions in the exponents apart:

$$\frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \times \exp{ \left\{ \frac{-1}{2 \sigma_*^2} \frac{ \left[ {\color{group2} \sigma_{\ell}^2} {\color{group5} \sigma_n^2 x_*^2} + {\color{group2} \sigma_{\ell}^2} {\color{group5} \sigma_{\ell}^2 \mu_n^2} \right] }{\sigma_{\ell}^2 \sigma_n^2 } \right\} } \\ \times \exp{ \left\{ \frac{-1}{2 \sigma_*^2} \frac{ \left[ \left( {\color{group2} \sigma_n^2} {\color{group5} \sigma_n^2 x_*^2} + {\color{group2} \sigma_n^2} {\color{group5} \sigma_{\ell}^2 \mu_n^2} \right) - {\color{group6} (\sigma_n^2 x_*)^2} - 2 {\color{group6} (\sigma_n^2 x_*) (\sigma_{\ell}^2 \mu_n) } - {\color{group6} (\sigma_{\ell}^2 \mu_n)^2} \right] }{\sigma_{\ell}^2 \sigma_n^2 } \right\} } \\ = \frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \times \exp{ \left\{ \frac{-1}{2 \sigma_*^2} \left[ \frac{ {\color{group2} \cancel{\sigma_{\ell}^2}} {\color{group5} \cancel{\sigma_n^2} x_*^2} }{\cancel{\sigma_{\ell}^2 \sigma_n^2} } +\frac{ {\color{group2} \cancel{\sigma_{\ell}^2}} {\color{group5} \sigma_{\ell}^2 \mu_n^2} }{\cancel{\sigma_{\ell}^2} \sigma_n^2 } \right] \right\} } \\ \times \exp{ \left\{ \frac{-1}{2 \sigma_*^2} \left[ \frac{ {\color{group2} \cancel{\sigma_n^2}} {\color{group5} \sigma_n^2 x_*^2} }{\sigma_{\ell}^2 \cancel{\sigma_n^2} } + \frac{ {\color{group2} \cancel{\sigma_n^2}} {\color{group5} \cancel{\sigma_{\ell}^2} \mu_n^2} }{\cancel{\sigma_{\ell}^2 \sigma_n^2} } - \frac{\color{group6} (\cancel{\sigma_n^2} x_*) (\sigma_n^2 x_*) }{\sigma_{\ell}^2 \cancel{\sigma_n^2} } - \frac{ 2 {\color{group6} (\cancel{\sigma_n^2} x_*) (\cancel{\sigma_{\ell}^2} \mu_n) } }{\cancel{\sigma_{\ell}^2 \sigma_n^2} } - \frac{\color{group6} (\cancel{\sigma_{\ell}^2} \mu_n) (\sigma_{\ell}^2 \mu_n) }{\cancel{\sigma_{\ell}^2} \sigma_n^2 } \right] \right\} }.$$

From here, it is straightforward to rewrite the expression as a Gaussian:

$$\frac{ 1 }{\sqrt{2 \pi \sigma_*^2} } \times \exp{ \left\{ \frac{ - 1 }{2 \sigma_*^2 } \left[ {\color{group6} x_*^2} {\color{group2} \cancel{+ \frac{\sigma_{\ell}^2 \mu_n^2}{\sigma_n^2}}} {\color{group5} \cancel{+ \frac{\sigma_n^2 x_*^2}{\sigma_{\ell}^2}}} {\color{group6} + \mu_n^2} {\color{group5} \cancel{- \frac{\sigma_n^2 x_*^2}{\sigma_{\ell}^2}}} {\color{group6} - 2 \mu_n x_*} {\color{group2} \cancel{- \frac{\sigma_{\ell}^2 \mu_n^2}{\sigma_n^2}}} \right] \right\} } \\ = \frac{1}{\sqrt{2 \pi \sigma_*^2}} \times \exp{ \left\{ \frac{- 1}{2 \sigma_*^2} \left[ {\color{group6} x_*^2 - 2 \mu_n x_* + \mu_n^2} \right] \right\} } \\ = \frac{1}{\sqrt{2 \pi {\color{group1} \sigma_*^2}}} \times \exp{ \left\{ \frac{- 1}{2 {\color{group1} \sigma_*^2}} {\color{group6} \left[ x_* - \mu_n \right]^2 } \right\} } \\ = \normalAbbreviation{ \color{group6} x_* }{\color{group6} \mu_n }{\color{group1} \sigma_n^2 + \sigma_{\ell}^2 }$$
Illustrations
Code

Multivariate: Normal-Known

Likelihood of the Training Data
\begin{align} p_{\color{group3} \ell} (\bm{x}_1, \bm{x}_2, \dots, \bm{x}_n \given {\color{group6} \Theta}) &= \prod\limits_i^n \normalAbbreviation{ \bm{x}_i }{\bm{\color{group6} \mu}_{\color{group3} \ell} }{\bm{\color{group6} \Sigma}_{\color{group3} \ell} } \tag{1} \label{multivariateNormalKnown1} \\ &= \frac{1 }{\left(2 \pi \right)^{nD/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{n/2} } \exp{ \left\{ \frac{-1}{2} \sum\limits_i^n {\color{group1} \left( \bm{x}_i - \bm{\mu}_{\ell} \right)^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group2} \left( \bm{x}_i - \bm{\mu}_{\ell} \right) } \right\} } \tag{2} \label{multivariateNormalKnown2} \\ &= C_1 \times \exp{ \left\{ \frac{-1}{2} \sum\limits_i^n {\color{group1} \left[ (\bm{x}_i - \bar{\bm{x}}) - (\bm{\mu}_{\ell} - \bar{\bm{x}}) \right]^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group2} \left[ (\bm{x}_i - \bar{\bm{x}}) - (\bm{\mu}_{\ell} - \bar{\bm{x}}) \right] } \right\} } \tag{3} \label{multivariateNormalKnown3} \\ &= \circleEquation{ C_1 \times \exp{ \left\{ \frac{-1}{2} tr( \bm{\Sigma}_{\ell}^{-1} \bm{S} ) \right\} } \times {\color{group5} \exp{ \left\{ \frac{-1}{2} (\bar{\bm{x}} - \bm{\mu}_{\ell})^{\top} (n \bm{\Sigma}_{\ell}^{-1}) (\bar{\bm{x}} - \bm{\mu}_{\ell}) \right\} } } } \tag{4} \label{multivariateNormalKnown4} \\ &\propto {\color{group5} \normalAbbreviation{ \bar{\bm{x}} }{\bm{\mu}_{\ell} }{\frac{1}{n} \bm{\Sigma}_{\ell} } } \tag{5} \label{multivariateNormalKnown5} \\ \end{align}
Equation 1: Assume the data is conditionally independent under a multivariate Gaussian.

From conditional independence, the (joint) likelihood of data is equal to the product of the individual likelihoods:

$$\likelihood{ {\color{group1} \bm{x}_1}, {\color{group1} \bm{x}_2}, \dots, {\color{group1} \bm{x}_n} }{\color{group2} \Theta } = \likelihood{\color{group1} \bm{x}_1}{\color{group2} \Theta} \times \likelihood{\color{group1} \bm{x}_2}{\color{group2} \Theta} \times \dots \times \likelihood{\color{group1} \bm{x}_n}{\color{group2} \Theta}.$$

Since we assume the individual likelihoods are Gaussian, ${\color{group2} p_{\ell}}(\bm{x}_i \given \Theta) = {\color{group2} \mathcal{N}} (\bm{x}_i \given \bm{\mu}_{\color{group2} \ell}, \bm{\sigma}_{\color{group2} \ell}^2)$:

$${\color{group2} p_{\ell}}(\bm{x}_i \given \Theta) \times {\color{group2} p_{\ell}}(\bm{x}_2 \given \Theta) \times \dots \times {\color{group2} p_{\ell}}(\bm{x}_n \given \Theta) \\ = {\color{group2} \mathcal{N}} (\bm{x}_1 \given \bm{\mu}_{\color{group2} \ell}, \bm{\sigma}_{\color{group2} \ell}^2) \times {\color{group2} \mathcal{N}} (\bm{x}_2 \given \bm{\mu}_{\color{group2} \ell}, \bm{\sigma}_{\color{group2} \ell}^2) \times \dots \times {\color{group2} \mathcal{N}} (\bm{x}_n \given \bm{\mu}_{\color{group2} \ell}, \bm{\sigma}_{\color{group2} \ell}^2).$$

Rewrite the product using ${\color{group5} \prod}$ notation:

$$\normalAbbreviation{ \color{group5} \bm{x}_1 }{\bm{\mu}_{\ell} }{\bm{\sigma}_{\ell}^2 } {\color{group5} \times} \normalAbbreviation{ \color{group5} \bm{x}_2 }{\bm{\mu}_{\ell} }{\bm{\sigma}_{\ell}^2 } {\color{group5} \times} \dots {\color{group5} \times} \normalAbbreviation{\color{group5} \bm{x}_n }{\bm{\mu}_{\ell} }{\bm{\sigma}_{\ell}^2 } \\ = {\color{group5} \prod\limits_i^n} \normalAbbreviation{ \color{group5} \bm{x}_i }{\bm{\mu}_{\ell} }{\bm{\sigma}_{\ell}^2}.$$
Equation 2: Simplify constants and exponents in the product.

Write out the product by plugging into the Gaussian expression.

\begin{align} \prod\limits_i^n \normalAbbreviation{ \color{group1} \bm{x}_i }{\color{group2} \bm{\mu}_{\ell} }{\color{group2} \bm{\sigma}_{\ell}^2 } &= \frac{1 }{\left(2 \pi \right)^{D/2} \left\vert {\color{group2} \bm{\Sigma}_{\ell}} \right\vert^{1/2} } \exp{ \left\{ \frac{-1}{2} {\left( {\color{group1} \bm{x}_1} - \bm{\mu}_{\ell} \right)^{\top} } {\color{group2} \bm{\Sigma}_{\ell}}^{-1} {\left( {\color{group1} \bm{x}_1} - \bm{\mu}_{\ell} \right) } \right\} } \\ &\times \frac{1 }{\left(2 \pi \right)^{D/2} \left\vert {\color{group2} \bm{\Sigma}_{\ell}} \right\vert^{1/2} } \exp{ \left\{ \frac{-1}{2} {\left( {\color{group1} \bm{x}_2} - \bm{\mu}_{\ell} \right)^{\top} } {\color{group2} \bm{\Sigma}_{\ell}}^{-1} {\left( {\color{group1} \bm{x}_2} - \bm{\mu}_{\ell} \right) } \right\} } \\ &\times \dots \\ &\times \frac{1 }{\left(2 \pi \right)^{D/2} \left\vert {\color{group2} \bm{\Sigma}_{\ell}} \right\vert^{1/2} } \exp{ \left\{ \frac{-1}{2} {\left( {\color{group1} \bm{x}_n} - \bm{\mu}_{\ell} \right)^{\top} } {\color{group2} \bm{\Sigma}_{\ell}}^{-1} {\left( {\color{group1} \bm{x}_n} - \bm{\mu}_{\ell} \right) } \right\} } \end{align}

Bring the $n$ normalizing constants to the front.

\begin{align} {\color{group5} \frac{1 }{\left(2 \pi \right)^{D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{1/2} } } \exp{ \left\{ \frac{-1}{2} {\left( \bm{x}_1 - \bm{\mu}_{\ell} \right)^{\top} } \bm{\Sigma}_{\ell}^{-1} {\left( \bm{x}_1 - \bm{\mu}_{\ell} \right) } \right\} } \\ \times {\color{group5} \frac{1 }{\left(2 \pi \right)^{D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{1/2} } } \exp{ \left\{ \frac{-1}{2} {\left( \bm{x}_2 - \bm{\mu}_{\ell} \right)^{\top} } {\bm{\Sigma}_{\ell}}^{-1} {\left( \bm{x}_2 - \bm{\mu}_{\ell} \right) } \right\} } \\ \times \dots \times {\color{group5} \frac{1 }{\left(2 \pi \right)^{D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{1/2} } } \exp{ \left\{ \frac{-1}{2} {\left( \bm{x}_n - \bm{\mu}_{\ell} \right)^{\top} } {\bm{\Sigma}_{\ell}}^{-1} {\left( \bm{x}_n - \bm{\mu}_{\ell} \right) } \right\} } \\ = {\color{group5} \left( \frac{1 }{\left(2 \pi \right)^{D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{1/2} } \right)^n } \times \exp{ \left\{ \frac{-1}{2} {\left( \bm{x}_1 - \bm{\mu}_{\ell} \right)^{\top} } \bm{\Sigma}_{\ell}^{-1} {\left( \bm{x}_1 - \bm{\mu}_{\ell} \right) } \right\} } \\ \times \exp{ \left\{ \frac{-1}{2} {\left( \bm{x}_2 - \bm{\mu}_{\ell} \right)^{\top} } {\bm{\Sigma}_{\ell}}^{-1} {\left( \bm{x}_2 - \bm{\mu}_{\ell} \right) } \right\} } \\ \times \dots \times \exp{ \left\{ \frac{-1}{2} {\left( \bm{x}_n - \bm{\mu}_{\ell} \right)^{\top} } {\bm{\Sigma}_{\ell}}^{-1} {\left( \bm{x}_n - \bm{\mu}_{\ell} \right) } \right\} } \\ \end{align}

Sum exponents. Recall that $a^b {\color{group5} \times} a^c = a^{b {\color{group5} +} c}$ .

$$\left( \frac{1 }{\left(2 \pi \right)^{D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{1/2} } \right)^n \\ \times \exp{ \left\{ \frac{-1}{2} {\left( \bm{x}_1 - \bm{\mu}_{\ell} \right)^{\top} } \bm{\Sigma}_{\ell}^{-1} {\left( \bm{x}_1 - \bm{\mu}_{\ell} \right) } \right\} } \\ {\color{group5} \times} \exp{ \left\{ \frac{-1}{2} {\left( \bm{x}_2 - \bm{\mu}_{\ell} \right)^{\top} } {\bm{\Sigma}_{\ell}}^{-1} {\left( \bm{x}_2 - \bm{\mu}_{\ell} \right) } \right\} } \\ {\color{group5} \times} \dots {\color{group5} \times} \exp{ \left\{ \frac{-1}{2} {\left( \bm{x}_n - \bm{\mu}_{\ell} \right)^{\top} } {\bm{\Sigma}_{\ell}}^{-1} {\left( \bm{x}_n - \bm{\mu}_{\ell} \right) } \right\} } \\ = \left( \frac{1 }{\left(2 \pi \right)^{D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{1/2} } \right)^n \times \exp{ \left\{ {\color{group5} \sum\limits_i^n } \frac{-1}{2} {\left( \bm{x}_i - \bm{\mu}_{\ell} \right)^{\top} } {\bm{\Sigma}_{\ell}}^{-1} {\left( \bm{x}_i - \bm{\mu}_{\ell} \right) } \right\} }$$

Distribute $n$ in the normalizing constant and factor $\frac{-1}{2}$ in the exponent.

\begin{align} \left( \frac{1 }{\left(2 \pi \right)^{D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{1/2} } \right)^{\color{group5} n} \times \exp{ \left\{ \sum\limits_i^n {\color{group6} \frac{-1}{2}} {\left( \bm{x}_i - \bm{\mu}_{\ell} \right)^{\top} } {\bm{\Sigma}_{\ell}}^{-1} {\left( \bm{x}_i - \bm{\mu}_{\ell} \right) } \right\} } \\ = \frac{1 }{\left(2 \pi \right)^{\color{group5} n D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{\color{group5} n /2} } \times \exp{ \left\{ {\color{group6} \frac{-1}{2}} \sum\limits_i^n {\left( \bm{x}_i - \bm{\mu}_{\ell} \right)^{\top} } {\bm{\Sigma}_{\ell}}^{-1} {\left( \bm{x}_i - \bm{\mu}_{\ell} \right) } \right\} } \end{align}
Equation 3: Introduce the empirical mean via $0 = \bar{\bm{x}} - \bar{\bm{x}}$.
\begin{align} \frac{1 }{\left(2 \pi \right)^{n D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{n /2} } \times \exp{ \left\{ \frac{-1}{2} \sum\limits_i^n {\left( \bm{x}_i - \bm{\mu}_{\ell} \right)^{\top} } {\bm{\Sigma}_{\ell}}^{-1} {\left( \bm{x}_i - \bm{\mu}_{\ell} \right) } \right\} } \\ = \frac{1 }{\left(2 \pi \right)^{n D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{n /2} } \times \exp{ \left\{ \frac{-1}{2} \sum\limits_i^n \left( \bm{x}_i + {\color{group5} 0} - \bm{\mu}_{\ell} \right)^{\top} {\bm{\Sigma}_{\ell}}^{-1} \left( \bm{x}_i + {\color{group5} 0} - \bm{\mu}_{\ell} \right) \right\} } \\ = \frac{1 }{\left(2 \pi \right)^{n D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{n /2} } \times \exp{ \left\{ \frac{-1}{2} \sum\limits_i^n \left( \bm{x}_i + {\color{group5} \bar{\bm{x}} - \bar{\bm{x}}} - \bm{\mu}_{\ell} \right)^{\top} {\bm{\Sigma}_{\ell}}^{-1} \left( \bm{x}_i + {\color{group5} \bar{\bm{x}} - \bar{\bm{x}}} - \bm{\mu}_{\ell} \right) \right\} } \\ = \frac{1 }{\left(2 \pi \right)^{n D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{n /2} } \times \exp{ \left\{ \frac{-1}{2} \sum\limits_i^n \left[ (\bm{x}_i - {\color{group5} \bar{\bm{x}}}) {\color{group6} +} ( {\color{group5} \bar{\bm{x}}} {\color{group6} -} {\color{group1} \bm{\mu}_{\ell}} ) \right]^{\top} {\bm{\Sigma}_{\ell}}^{-1} \left[ (\bm{x}_i - {\color{group5} \bar{\bm{x}}}) {\color{group6} +} ( {\color{group5} \bar{\bm{x}}} {\color{group6} -} {\color{group1} \bm{\mu}_{\ell}} ) \right] \right\} } \\ = \frac{1 }{\left(2 \pi \right)^{n D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{n /2} } \times \exp{ \left\{ \frac{-1}{2} \sum\limits_i^n \left[ (\bm{x}_i - {\color{group5} \bar{\bm{x}}}) {\color{group6} -} ( {\color{group1} \bm{\mu}_{\ell}} {\color{group6} -} {\color{group5} \bar{\bm{x}}} ) \right]^{\top} {\bm{\Sigma}_{\ell}}^{-1} \left[ (\bm{x}_i - {\color{group5} \bar{\bm{x}}}) {\color{group6} -} ( {\color{group1} \bm{\mu}_{\ell}} {\color{group6} -} {\color{group5} \bar{\bm{x}}} ) \right] \right\} } \\ \end{align}

Letting $C_1 = \frac{1 }{\left(2 \pi \right)^{n D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{n /2} },$

$${\color{group2} \frac{1 }{\left(2 \pi \right)^{n D/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{n /2} } } \times \exp{ \left\{ \frac{-1}{2} \sum\limits_i^n \left[ (\bm{x}_i - {\color{group5} \bar{\bm{x}}}) {\color{group6} -} ( {\color{group1} \bm{\mu}_{\ell}} {\color{group6} -} {\color{group5} \bar{\bm{x}}} ) \right]^{\top} {\bm{\Sigma}_{\ell}}^{-1} \left[ (\bm{x}_i - {\color{group5} \bar{\bm{x}}}) {\color{group6} -} ( {\color{group1} \bm{\mu}_{\ell}} {\color{group6} -} {\color{group5} \bar{\bm{x}}} ) \right] \right\} } \\ = {\color{group2} C_1} \times \exp{ \left\{ \frac{-1}{2} \sum\limits_i^n \left[ (\bm{x}_i - {\color{group5} \bar{\bm{x}}}) {\color{group6} -} ( {\color{group1} \bm{\mu}_{\ell}} {\color{group6} -} {\color{group5} \bar{\bm{x}}} ) \right]^{\top} {\bm{\Sigma}_{\ell}}^{-1} \left[ (\bm{x}_i - {\color{group5} \bar{\bm{x}}}) {\color{group6} -} ( {\color{group1} \bm{\mu}_{\ell}} {\color{group6} -} {\color{group5} \bar{\bm{x}}} ) \right] \right\} } \\$$
Equation 4: Expand the quadratic expression, and then simplify, letting $\bm{S}$ represent the scatter matrix.

We expand by first FOILing the quadratic expression in the summation of the exponent:

\begin{align} \left[ {\color{group5} (\bm{x}_i - \bar{\bm{x}}) } - {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } \right]^{\top} \bm{\Sigma}_{\ell}^{-1} \left[ {\color{group5} (\bm{x}_i - \bar{\bm{x}}) } - {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } \right] &={\color{group5} (\bm{x}_i - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} \left[ {\color{group5} (\bm{x}_i - \bar{\bm{x}}) } - {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } \right] - {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} \left[ {\color{group5} (\bm{x}_i - \bar{\bm{x}}) } - {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } \right] \\ &={\color{group5} (\bm{x}_i - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group5} (\bm{x}_i - \bar{\bm{x}}) } - {\color{group5} (\bm{x}_i - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } - {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group5} (\bm{x}_i - \bar{\bm{x}}) } + {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } \end{align}

Since a scalar $a$ is equal to the transpose of itself, $a = a^{\top}$, and a symmetric matrix $A^{-1}$ is equal to the transpose of itself, $A^{-1} = A^{- \top}$, we can simplify the two middle terms in the expression above::

$$- {\color{group5} (\bm{x}_i - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } - {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group5} (\bm{x}_i - \bar{\bm{x}}) } \\ = - {\color{group5} (\bm{x}_i - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } - \left[ {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group5} (\bm{x}_i - \bar{\bm{x}}) } \right]^{\top} \\ = - {\color{group5} (\bm{x}_i - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } - \left[ {\color{group5} (\bm{x}_i - \bar{\bm{x}}) }^{\top} \bm{\Sigma}_{\ell}^{- \top} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} }^{\top} \right] \\ = - {\color{group5} (\bm{x}_i - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } - {\color{group5} (\bm{x}_i - \bar{\bm{x}}) }^{\top} \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } \\ = - 2 {\color{group5} (\bm{x}_i - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) }$$

Now, reintroducing and distributing the summation in the exponent,

$$\sum\limits_i^n \left[ (\bm{x}_i - \bar{\bm{x}})^{\top} \bm{\Sigma}_{\ell}^{-1} (\bm{x}_i - \bar{\bm{x}}) - 2 {\color{group5} (\bm{x}_i - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } + (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} \bm{\Sigma}_{\ell}^{-1} (\bm{\mu}_{\ell} - \bar{\bm{x}}) \right] \\ = \left[ \sum\limits_i^n (\bm{x}_i - \bar{\bm{x}})^{\top} \bm{\Sigma}_{\ell}^{-1} (\bm{x}_i - \bar{\bm{x}}) \right] + \left[ \sum\limits_i^n - 2 {\color{group5} (\bm{x}_i - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } \right] + \left[ \sum\limits_i^n (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} \bm{\Sigma}_{\ell}^{-1} (\bm{\mu}_{\ell} - \bar{\bm{x}}) \right].$$

Recalling that $n \bar{\bm{x}} = n \sum\limits_i^n \bm{x_i}$, we show that the middle set of terms sum to $0$:

$$\sum\limits_i^n {\color{group3} - 2} {\color{group5} (\bm{x}_i - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } = {\color{group3} - 2} \sum\limits_i^n {\color{group5} (\bm{x}_i - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } \\ = - 2 \left[ \sum\limits_i^n {\color{group5} \bm{x}_i}^{\top} \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } \ {\color{group5} -} \ {\color{group5} \bar{\bm{x}}^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } \right] \\ = - 2 \left[ n {\color{group5} \bar{\bm{x}}}^{\top} \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } \ {\color{group5} -} \ n {\color{group5} \bar{\bm{x}}^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } \right] \\ = -2 \times 0 = 0.$$

This means

$$\left[ \sum\limits_i^n (\bm{x}_i - \bar{\bm{x}})^{\top} \bm{\Sigma}_{\ell}^{-1} (\bm{x}_i - \bar{\bm{x}}) \right] + \left[ \sum\limits_i^n - 2 {\color{group5} (\bm{x}_i - \bar{\bm{x}})^{\top} } \bm{\Sigma}_{\ell}^{-1} {\color{group6} (\bm{\mu}_{\ell} - \bar{\bm{x}}) } \right] + \left[ \sum\limits_i^n (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} \bm{\Sigma}_{\ell}^{-1} (\bm{\mu}_{\ell} - \bar{\bm{x}}) \right] \\ = \left[ \sum\limits_i^n (\bm{x}_i - \bar{\bm{x}})^{\top} \bm{\Sigma}_{\ell}^{-1} (\bm{x}_i - \bar{\bm{x}}) \right] + \left[ \sum\limits_i^n (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} \bm{\Sigma}_{\ell}^{-1} (\bm{\mu}_{\ell} - \bar{\bm{x}}) \right].$$

Since $(\bm{\mu}_{\ell} - \bar{\bm{x}})$ is constant with respect to the indices $i = 1, 2, \dots, n$, and are allowed to commute constants, we have

$$\left[ \sum\limits_i^n (\bm{x}_i - \bar{\bm{x}})^{\top} \bm{\Sigma}_{\ell}^{-1} (\bm{x}_i - \bar{\bm{x}}) \right] + \left[ {\color{group3} \sum\limits_i^n} (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} \bm{\Sigma}_{\ell}^{-1} (\bm{\mu}_{\ell} - \bar{\bm{x}}) \right] \\ = \left[ \sum\limits_i^n (\bm{x}_i - \bar{\bm{x}})^{\top} \bm{\Sigma}_{\ell}^{-1} (\bm{x}_i - \bar{\bm{x}}) \right] + {\color{group3} n} (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} \bm{\Sigma}_{\ell}^{-1} (\bm{\mu}_{\ell} - \bar{\bm{x}}) \\ = \left[ \sum\limits_i^n (\bm{x}_i - \bar{\bm{x}})^{\top} \bm{\Sigma}_{\ell}^{-1} (\bm{x}_i - \bar{\bm{x}}) \right] + (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} \left( {\color{group3} n} \bm{\Sigma}_{\ell}^{-1} \right) (\bm{\mu}_{\ell} - \bar{\bm{x}}).$$

So, plugging back in, we have the following likelihood, where the notation for the remaining summation can be further simplified:

$$\frac{1 }{\left(2 \pi \right)^{nD/2} \left\vert \bm{\Sigma} \right\vert^{n/2} } \exp{ \left\{ \frac{-1}{2} {\color{teal} \left[ \sum\limits_i^n (\bm{x}_i - \bar{\bm{x}})^{\top} \bm{\Sigma}^{-1} (\bm{x}_i - \bar{\bm{x}}) \right] } - \frac{1}{2} \left[ (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} (n \bm{\Sigma}_{\ell}^{-1}) (\bm{\mu}_{\ell} - \bar{\bm{x}}) \right] \right\} }$$

Since the trace of a scalar $a$ is equal to itself, $tr(a) = a$, and the result of the remaining summation is a scalar, we can simplify the notation with a little work:

$${\color{teal} \sum\limits_i^n (\bm{x}_i - \bar{\bm{x}})^{\top} \bm{\Sigma}^{-1} (\bm{x}_i - \bar{\bm{x}}) } = tr \left( {\color{teal} \sum\limits_i^n (\bm{x}_i - \bar{\bm{x}})^{\top} \bm{\Sigma}^{-1} (\bm{x}_i - \bar{\bm{x}}) } \right).$$

Since we know that $tr( {\color{group1} \bm{a}^{\top}} {\color{group2} \bm{b}} ) = tr( {\color{group2} \bm{b}} {\color{group1} \bm{a}^{\top}} )$ given two vectors $\bm{\color{group1} a}$ and $\bm{\color{group2} b}$,

\begin{align} tr \left( \sum\limits_i^n {\color{group1} (\bm{x}_i - \bar{\bm{x}})^{\top} } {\color{group2} \bm{\Sigma}^{-1} (\bm{x}_i - \bar{\bm{x}}) } \right) &= tr \left( \sum\limits_i^n {\color{group2} \bm{\Sigma}^{-1} (\bm{x}_i - \bar{\bm{x}}) } {\color{group1} (\bm{x}_i - \bar{\bm{x}})^{\top} } \right). \end{align}

Factoring ${\color{group5} \bm{\Sigma}^{-1}}$,

$$tr \left( \sum\limits_i^n {\color{group5} \bm{\Sigma}^{-1}} (\bm{x}_i - \bar{\bm{x}}) (\bm{x}_i - \bar{\bm{x}})^{\top} \right) = tr \left( {\color{group5} \bm{\Sigma}^{-1}} \sum\limits_i^n (\bm{x}_i - \bar{\bm{x}}) (\bm{x}_i - \bar{\bm{x}})^{\top} \right).$$

Recognizing ${\color{group6} \bm{S}} = {\color{group6} \sum\limits_i^n (\bm{x}_i - \bar{\bm{x}}) (\bm{x}_i - \bar{\bm{x}})^{\top} }$ as a scatter matrix, we can simplify our notation:

$$tr \left( {\color{group5} \bm{\Sigma}^{-1}} {\color{group6} \sum\limits_i^n (\bm{x}_i - \bar{\bm{x}}) (\bm{x}_i - \bar{\bm{x}})^{\top} } \right) = tr \left( {\color{group5} \bm{\Sigma}^{-1}} {\color{group6} \bm{S}} \right).$$

Plugging back in, we have the following:

$$\frac{1 }{\left(2 \pi \right)^{nD/2} \left\vert \bm{\Sigma} \right\vert^{n/2} } \times \exp{ \left\{ \frac{-1}{2} {\color{teal} tr \left( \bm{\Sigma}^{-1} \bm{S} \right) } - \frac{1}{2} \left[ (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} (n \bm{\Sigma}^{-1}) (\bm{\mu}_{\ell} - \bar{\bm{x}}) \right] \right\} } \\ = \frac{1 }{\left(2 \pi \right)^{nD/2} \left\vert \bm{\Sigma} \right\vert^{n/2} } \times \exp{ \left\{ \frac{-1}{2} {\color{teal} tr \left( \bm{\Sigma}^{-1} \bm{S} \right) } \right\} } \times \exp{ \left\{ - \frac{1}{2} \left[ (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} (n \bm{\Sigma}^{-1}) (\bm{\mu}_{\ell} - \bar{\bm{x}}) \right] \right\} }$$
Equation 5: Note the proportionality to the sampling distribution of the empirical mean.

Recall that $\bm{S} = \sum\limits_i^n (\bm{x}_i - \bar{\bm{x}}) (\bm{x}_i - \bar{\bm{x}})^{\top}$ , meaning that it does not depend on the likelihood mean $\bm{\mu}_{\ell}$. Since the only variable in the Normal-Known model is $\bm{\mu}_{\ell}$, this makes the second factor constant.

Dropping $C_1$ and $\exp{ \left\{ \frac{-1}{2} tr \left( \bm{\Sigma}_{\ell}^{-1} \bm{S} \right) \right\} }$ , we see that the remaining the exponential factor is proportional to a Gaussian with mean $\bm{\mu}_{\ell}$ and covariance $\bm{\Sigma}_{\ell}$.

$${\color{group6} \frac{1 }{\left(2 \pi \right)^{nD/2} \left\vert \bm{\Sigma}_{\ell} \right\vert^{n/2} } \times \exp{ \left\{ \frac{-1}{2} tr \left( \bm{\Sigma}_{\ell}^{-1} \bm{S} \right) \right\} } \times } \exp{ \left\{ - \frac{1}{2} \left[ (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} (n \bm{\Sigma}_{\ell}^{-1}) (\bm{\mu}_{\ell} - \bar{\bm{x}}) \right] \right\} } \\ {\color{group6} \propto} \ \exp{ \left\{ - \frac{1}{2} \left[ (\bm{\mu}_{\ell} - \bar{\bm{x}})^{\top} (n \bm{\Sigma}_{\ell}^{-1}) (\bm{\mu}_{\ell} - \bar{\bm{x}}) \right] \right\} } \\ {\color{group6} \propto} \ \normalAbbreviation{ \bar{\bm{x}} }{\bm{\mu_{\ell}} }{\bm{\Sigma}_{\ell} }$$
Prior on the True Mean
Equation 1:
Equation 2:
Marginal Likelihood of the Training Data
Equation 1:
Equation 2:
Posterior on the True Mean and Variance
Equation 1:
Equation 2:
Posterior Predictive of Test Data
Equation 1:
Equation 2:
Illustrations
Code
Code