Beta–Binomial Model

 

Beta–Binomial Model — Detailed Explanation

1. What Is the Beta–Binomial Model?

The Beta–Binomial model is a Bayesian model for count data where:

  • Outcomes are success/failure

  • The probability of success is unknown

  • Uncertainty about that probability is modeled using a Beta distribution

It combines:

  • Binomial likelihood

  • Beta prior


2. When Do We Use It?

Use the Beta–Binomial model when:

  • Data consists of counts of successes

  • Probability of success is uncertain

  • We want to incorporate prior belief

  • We want uncertainty in the estimate

Examples

  • Coin tossing (unknown bias)

  • Conversion rates

  • Defect rates in manufacturing

  • Click-through rates (A/B testing)


3. Model Components

(a) Likelihood: Binomial Distribution

P(X=kp)=(nk)pk(1p)nkP(X = k \mid p) = \binom{n}{k} p^k (1-p)^{n-k}

Where:

  • nn: number of trials

  • kk: number of successes

  • pp: probability of success (unknown)


(b) Prior: Beta Distribution

pBeta(α,β)p \sim \text{Beta}(\alpha, \beta)

Where:

  • α,β>0\alpha, \beta > 0

  • Encodes prior belief about pp


4. Why Beta Is Chosen?

Because Beta is conjugate to Binomial.

📌 Conjugacy ensures:

  • Posterior has the same form as prior

  • Closed-form updates

  • Easy interpretation


5. Posterior Distribution (Key Result)

Given:

  • kk successes

  • nk failures

pDBeta(α+k,  β+nk)\boxed{ p \mid D \sim \text{Beta}(\alpha + k,\; \beta + n - k) }

📌 Prior parameters act like pseudo-counts.


6. Predictive Distribution (Beta–Binomial)

The marginal distribution of kk (integrating out pp) is:

P(X=k)=(nk)B(α+k,β+nk)B(α,β)\boxed{ P(X = k) = \binom{n}{k} \frac{B(\alpha + k, \beta + n - k)} {B(\alpha, \beta)} }

This is the Beta–Binomial distribution.


7. Mean and Variance

Mean

E[X]=nαα+β\mathbb{E}[X] = n \frac{\alpha}{\alpha + \beta}

Variance

Var(X)=nαβ(α+β)2α+β+nα+β+1\text{Var}(X) = n \frac{\alpha\beta}{(\alpha+\beta)^2} \frac{\alpha+\beta+n}{\alpha+\beta+1}

📌 Variance is larger than Binomial (over-dispersion).


8. Worked Example: Coin Toss

Prior Belief

pBeta(2,2)p \sim \text{Beta}(2,2)

(Slight belief in fairness)


Data

  • 10 tosses

  • 7 heads


Posterior

pBeta(9,5)p \sim \text{Beta}(9,5)

Posterior Mean

E[pD]=9140.64\mathbb{E}[p \mid D] = \frac{9}{14} \approx 0.64

9. MAP and MLE Comparison

MLE (Frequentist)

p^MLE=kn\hat{p}_{\text{MLE}} = \frac{k}{n}

MAP (Bayesian)

p^MAP=α+k1α+β+n2\hat{p}_{\text{MAP}} = \frac{\alpha + k - 1}{\alpha + \beta + n - 2}

📌 MAP includes prior information.


10. Intuition Behind Beta–Binomial

ConceptMeaning
Beta prior            Belief about probability
Binomial likelihood            Observed data
Posterior            Updated belief
Predictive            Future outcomes

11. Why Is Beta–Binomial Useful?

Advantages

  • Handles small datasets well

  • Avoids zero-probability issues

  • Models over-dispersion

  • Easy Bayesian updating


12. Beta–Binomial vs Binomial

Feature        Binomial            Beta–Binomial
Probability        Fixed                Random
Variance        Smaller                Larger
Prior belief        Not allowed                Allowed

13. Applications in Machine Learning

  • A/B testing

  • Naive Bayes classifiers

  • Bayesian bandits

  • Online learning

  • Reliability analysis

15. Summary

  • Beta–Binomial is a Bayesian model for count data

  • Beta prior + Binomial likelihood

  • Posterior is Beta

  • Predictive distribution is Beta–Binomial

  • Captures uncertainty and over-dispersion


One-Line Intuition for Students

The Beta–Binomial model treats probability itself as uncertain and learns it from data.


How Do We Get the Posterior ?

We are using the Beta–Binomial model.


Step 1: Write Down the Prior

The prior belief about the probability of success pp is:

pBeta(α,β)p \sim \text{Beta}(\alpha,\beta)

In the example:

pBeta(2,2)\boxed{ p \sim \text{Beta}(2,2) }

Interpretation:

  • α1=1 prior success

  • β1=1\beta - 1 = 1 prior failure

This reflects a mild belief that the coin is fair.


Step 2: Observe the Data

We perform 10 coin tosses and observe:

  • k=7k = 7 successes (heads)

  • nk=3n-k = 3 failures (tails)


Step 3: Write the Likelihood

The Binomial likelihood is:

P(Dp)pk(1p)nkP(D \mid p) \propto p^k (1-p)^{n-k}

Substitute values:

P(Dp)p7(1p)3P(D \mid p) \propto p^7 (1-p)^3

Step 4: Apply Bayes’ Theorem

P(pD)P(Dp)P(p)P(p \mid D) \propto P(D \mid p)\,P(p)

Substitute the likelihood and the prior:

P(pD)p7(1p)3likelihoodp21(1p)21priorP(p \mid D) \propto \underbrace{p^7 (1-p)^3}_{\text{likelihood}} \cdot \underbrace{p^{2-1}(1-p)^{2-1}}_{\text{prior}}

Step 5: Combine the Exponents

P(pD)p7+1(1p)3+1=p8(1p)4P(p \mid D) \propto p^{7+1}(1-p)^{3+1} = p^{8}(1-p)^{4}

Step 6: Identify the Posterior Distribution

Recall the Beta distribution form:

Beta(α,β)pα1(1p)β1\text{Beta}(\alpha',\beta') \propto p^{\alpha'-1}(1-p)^{\beta'-1}

Match exponents:

α1=8α=9\alpha' - 1 = 8 \quad \Rightarrow \quad \alpha' = 9
β1=4β=5\beta' - 1 = 4 \quad \Rightarrow \quad \beta' = 5

✅ Final Posterior

pDBeta(9,5)\boxed{ p \mid D \sim \text{Beta}(9,5) }

Step 7: The Simple Update Rule (Memorize This)

For Beta–Binomial:

Posterior=Beta(α+k,  β+nk)\boxed{ \text{Posterior} = \text{Beta}(\alpha + k,\; \beta + n - k) }

Plug in:

α+k=2+7=9\alpha + k = 2 + 7 = 9
β+(nk)=2+3=5\beta + (n-k) = 2 + 3 = 5

Intuitive Explanation (Very Important for Students)

Posterior = prior belief + observed evidence

QuantityMeaning
α\alpha
    prior successes
β\beta
    prior failures
kk
    observed successes
nk    observed failures

One-Line Summary for Exams

In the Beta–Binomial model, the posterior parameters are obtained by adding the number of observed successes and failures to the prior parameters. 


How Do We Get the Expectation 914\displaystyle \frac{9}{14}?

We are computing the mean (expected value) of a Beta distribution.


Step 1: Recall the Posterior Distribution

From the previous step, we obtained:

pDBeta(9,5)p \mid D \sim \text{Beta}(9,5)

That means:

  • α=9

  • β=5\beta = 5


Step 2: Mean of a Beta Distribution

For a Beta random variable:

E[p]=αα+β\boxed{ \mathbb{E}[p] = \frac{\alpha}{\alpha + \beta} }

This is a standard result that can be derived by integration.


Step 3: Substitute the Values

E[p]=99+5=914\mathbb{E}[p] = \frac{9}{9 + 5} = \boxed{\frac{9}{14}}


Step 4: Numerical Value

9140.643\frac{9}{14} \approx 0.643


Interpretation

  • This is the Bayesian estimate of the probability of success

  • It balances:

    • Prior belief

    • Observed data


Why This Makes Sense Intuitively

Recall the pseudo-count interpretation:

QuantityValue
Prior successes                    α1=8\alpha - 1 = 8
Prior failuresβ1=4\beta - 1 = 4
Total pseudo-trials9+5=149 + 5 = 14

So:

E[p]=total successestotal trials\mathbb{E}[p] = \frac{\text{total successes}}{\text{total trials}}


Compare with MLE

MethodEstimate
MLE     7/10=0.77/10 = 0.7
Bayesian mean     9/140.6439/14 \approx 0.643

📌 Bayesian estimate is more conservative due to the prior.


Exam-Ready One-Liner

The expectation of a Beta(α,β) distribution is α/(α+β), obtained by integrating the density over [0,1].

Comments

Popular posts from this blog

Advanced Mathematics for Computer Science HNCST409 KTU BTech Honors 2024 Scheme

AKS Primality Testing

Galois Field and Operations