Normal–Normal Bayesian Model

November 14, 2025

Normal–Normal Bayesian Model

(Gaussian Likelihood + Gaussian Prior)

1. What Is the Normal–Normal Model?

The Normal–Normal model is a Bayesian model used when:

Data are normally distributed
The variance is known
The mean is unknown
Our prior belief about the mean is Gaussian

It is one of the most important Bayesian models because:

The posterior has a closed form
It illustrates Bayesian updating
It connects directly to regularization in ML

2. Model Assumptions

Likelihood (Data Model)

x_i \mid \mu \sim \mathcal{N}(\mu,\sigma^2), \quad i=1,\dots,n

$\mu$ : unknown mean
$\sigma^2$ : known variance

Prior (Belief about the Mean)

\mu \sim \mathcal{N}(\mu_0,\tau^2)

$\mu_0$ : prior mean
$\tau^2$ : prior variance

3. Why Is This Called “Normal–Normal”?

Component	Distribution
Likelihood	Normal
Prior	Normal
Posterior	Normal

📌 This is an example of a conjugate prior.

4. Posterior Distribution (Key Result)

After observing data $D = \{x_1,\dots,x_n\}$ :

\boxed{ \mu \mid D \sim \mathcal{N}(\mu_n,\tau_n^2) }

Where:

Posterior Mean

\boxed{ \mu_n = \frac{\frac{n}{\sigma^2}\bar{x} + \frac{1}{\tau^2}\mu_0} {\frac{n}{\sigma^2} + \frac{1}{\tau^2}} }

Posterior Variance

\boxed{ \tau_n^2 = \left(\frac{n}{\sigma^2} + \frac{1}{\tau^2}\right)^{-1} }

5. Interpretation of the Posterior Mean

The posterior mean is a weighted average:

\mu_n = w_{\text{data}}\bar{x} + w_{\text{prior}}\mu_0

Where:

Weights are proportional to precision (inverse variance)

📌 More confidence → more influence.

6. Worked Numerical Example

Given:

Prior:
$\mu \sim \mathcal{N}(5,4)$
Known variance:
$\sigma^2 = 1$
Observed data:
$x = \{6,5,7\}$

Step 1: Compute Sample Mean

\bar{x} = \frac{6+5+7}{3} = 6

Step 2: Compute Posterior Mean

\mu_n = \frac{\frac{3}{1}\cdot 6 + \frac{1}{4}\cdot 5} {\frac{3}{1} + \frac{1}{4}} = \frac{18 + 1.25}{3.25} = \boxed{5.92}

Step 3: Compute Posterior Variance

\tau_n^2 = \left(3 + 0.25\right)^{-1} = \boxed{0.308}

7. Final Posterior

\boxed{ \mu \mid D \sim \mathcal{N}(5.92,\,0.308) }

8. Key Insights

1. Data vs Prior

More data → posterior moves toward sample mean
Strong prior → posterior stays near prior mean

2. Uncertainty Shrinks

\tau_n^2 < \tau^2

More data → less uncertainty.

9. MAP and Bayesian Mean

For Gaussian posterior:

\mu_{\text{MAP}} = \mu_n

📌 MAP estimate equals posterior mean.

10. Limiting Cases (Important for Exams)

Large Data Limit

n \to \infty \Rightarrow \mu_n \to \bar{x}

Very Strong Prior

\tau^2 \to 0 \Rightarrow \mu_n \to \mu_0

11. Connection to Machine Learning

Ridge Regression

Normal prior on weights
Equivalent to $L_{2 regularization}$

Kalman Filter

Repeated Normal–Normal updates

12. Bayesian vs Frequentist

Aspect	Bayesian	Frequentist
Estimate	Distribution	Point
Uncertainty	Explicit	Asymptotic
Prior knowledge	Included	Ignored

13. Summary

Normal–Normal is a conjugate Bayesian model
Posterior is Gaussian
Mean is precision-weighted average
Variance shrinks with data

One-Line Intuition for Students

Bayesian learning averages what you believed before with what the data tells you, weighted by confidence.

Search This Blog

Advanced Mathematics for Computer Science HNCST409 KTU BTech Honors 2024 Scheme- Dr Binu V P