Conjugate Prior

 

Conjugate Prior — Detailed Explanation

1. Bayesian Recap (Why Priors Matter)

In Bayesian inference, we update beliefs using Bayes’ theorem:

P(θD)=P(Dθ)P(θ)P(D)P(\theta \mid D) = \frac{P(D \mid \theta) P(\theta)}{P(D)}

Where:

  • θ\theta = unknown parameter

  • P(θ)P(\theta) = prior

  • P(Dθ)P(D \mid \theta) = likelihood

  • P(θD)P(\theta \mid D) = posterior

The challenge: computing the posterior can be mathematically hard.


2. What Is a Conjugate Prior?

Definition

A conjugate prior is a prior distribution such that the posterior distribution belongs to the same family as the prior.

📌 Prior and posterior have the same functional form.


3. Why Are Conjugate Priors Important?

  1. Closed-form posterior

  2. Easy analytical updates

  3. Clear interpretation

  4. Computational efficiency

  5. Educational clarity

This is why they are widely used in:

  • Machine learning

  • Bayesian statistics

  • Online learning

  • Signal processing


4. Simple Example: Beta–Bernoulli

Likelihood

xiBernoulli(p)x_i \sim \text{Bernoulli}(p)

Prior

pBeta(α,β)p \sim \text{Beta}(\alpha, \beta)

Posterior

pDBeta(α+k,  β+nk)\boxed{ p \mid D \sim \text{Beta}(\alpha + k,\; \beta + n - k) }

Where:

  • kk = number of successes

  • nn = total trials

📌 Prior and posterior are both Beta distributions.


5. Why Does Conjugacy Work?

The likelihood and prior have compatible mathematical forms.

Example:

  • Bernoulli likelihood: pk(1p)nk

  • Beta prior: pα1(1p)β1p^{\alpha-1}(1-p)^{\beta-1}

Multiplying them preserves the Beta form.


6. General Pattern of Conjugate Priors

LikelihoodConjugate Prior
Bernoulli / Binomial    Beta
Multinomial    Dirichlet
Poisson    Gamma
Exponential    Gamma
Gaussian (mean unknown)    Gaussian
Gaussian (mean & variance unknown)    Normal–Inverse-Gamma

7. Gaussian–Gaussian Example (Brief)

Likelihood

xμN(μ,σ2)x \mid \mu \sim \mathcal{N}(\mu, \sigma^2)

Prior

μN(μ0,τ2)\mu \sim \mathcal{N}(\mu_0, \tau^2)

Posterior

μDN(μn,τn2)\mu \mid D \sim \mathcal{N}(\mu_n, \tau_n^2)

📌 Gaussian prior + Gaussian likelihood → Gaussian posterior


8. Conjugate Prior vs Non-Conjugate Prior

Aspect    ConjugateNon-Conjugate
Posterior        Closed form                    No closed form
Computation        Easy                    Requires MCMC / Variational methods
Interpretability        High                    Lower
Flexibility        Limited                    High

9. Interpretation in Terms of “Pseudo-Counts”

In many conjugate priors:

  • Prior behaves like imaginary observations

  • Posterior = prior counts + data counts

Example (Beta prior):

α1=prior successes,β1=prior failures\alpha - 1 = \text{prior successes}, \quad \beta - 1 = \text{prior failures}

10. Conjugate Priors in Machine Learning

Examples

  • Naive Bayes

  • Bayesian linear regression

  • Topic models (LDA)

  • A/B testing

  • Hidden Markov Models


11. MAP Estimation Connection

MAP estimate:

θMAP=argmaxP(θD)\theta_{\text{MAP}} = \arg\max P(\theta \mid D)

For conjugate priors:

  • MAP often has closed-form solution

  • MAP ≈ regularized MLE


12. Key Takeaways (Exam-Friendly)

  • Conjugate priors preserve distributional form

  • They simplify Bayesian updating

  • They allow analytical posteriors

  • Widely used in ML and statistics


One-Line Intuition for Students

A conjugate prior is chosen so that Bayesian updating stays mathematically simple.

Comments

Popular posts from this blog

Advanced Mathematics for Computer Science HNCST409 KTU BTech Honors 2024 Scheme

AKS Primality Testing

Galois Field and Operations