Beta–Binomial Model — Detailed Explanation

1. What Is the Beta–Binomial Model?

The Beta–Binomial model is a Bayesian model for count data where:

Outcomes are success/failure
The probability of success is unknown
Uncertainty about that probability is modeled using a Beta distribution

It combines:

Binomial likelihood
Beta prior

2. When Do We Use It?

Use the Beta–Binomial model when:

Data consists of counts of successes
Probability of success is uncertain
We want to incorporate prior belief
We want uncertainty in the estimate

Examples

Coin tossing (unknown bias)
Conversion rates
Defect rates in manufacturing
Click-through rates (A/B testing)

3. Model Components

(a) Likelihood: Binomial Distribution

P(X = k \mid p) = \binom{n}{k} p^k (1-p)^{n-k}

Where:

$n$ : number of trials
$k$ : number of successes
$p$ : probability of success (unknown)

(b) Prior: Beta Distribution

p \sim \text{Beta}(\alpha, \beta)

Where:

$\alpha, \beta > 0$
Encodes prior belief about $p$

4. Why Beta Is Chosen?

Because Beta is conjugate to Binomial.

📌 Conjugacy ensures:

Posterior has the same form as prior
Closed-form updates
Easy interpretation

5. Posterior Distribution (Key Result)

Given:

$k$ successes
$n - k failures$

\boxed{ p \mid D \sim \text{Beta}(\alpha + k,\; \beta + n - k) }

📌 Prior parameters act like pseudo-counts.

6. Predictive Distribution (Beta–Binomial)

The marginal distribution of $k$ (integrating out $p$ ) is:

\boxed{ P(X = k) = \binom{n}{k} \frac{B(\alpha + k, \beta + n - k)} {B(\alpha, \beta)} }

This is the Beta–Binomial distribution.

7. Mean and Variance

Mean

\mathbb{E}[X] = n \frac{\alpha}{\alpha + \beta}

Variance

\text{Var}(X) = n \frac{\alpha\beta}{(\alpha+\beta)^2} \frac{\alpha+\beta+n}{\alpha+\beta+1}

📌 Variance is larger than Binomial (over-dispersion).

8. Worked Example: Coin Toss

Prior Belief

p \sim \text{Beta}(2,2)

(Slight belief in fairness)

Data

10 tosses
7 heads

Posterior

p \sim \text{Beta}(9,5)

Posterior Mean

\mathbb{E}[p \mid D] = \frac{9}{14} \approx 0.64

9. MAP and MLE Comparison

MLE (Frequentist)

\hat{p}_{\text{MLE}} = \frac{k}{n}

MAP (Bayesian)

\hat{p}_{\text{MAP}} = \frac{\alpha + k - 1}{\alpha + \beta + n - 2}

📌 MAP includes prior information.

10. Intuition Behind Beta–Binomial

Concept	Meaning
Beta prior	Belief about probability
Binomial likelihood	Observed data
Posterior	Updated belief
Predictive	Future outcomes

11. Why Is Beta–Binomial Useful?

Advantages

Handles small datasets well
Avoids zero-probability issues
Models over-dispersion
Easy Bayesian updating

12. Beta–Binomial vs Binomial

Feature	Binomial	Beta–Binomial
Probability	Fixed	Random
Variance	Smaller	Larger
Prior belief	Not allowed	Allowed

13. Applications in Machine Learning

A/B testing
Naive Bayes classifiers
Bayesian bandits
Online learning
Reliability analysis

15. Summary

Beta–Binomial is a Bayesian model for count data
Beta prior + Binomial likelihood
Posterior is Beta
Predictive distribution is Beta–Binomial
Captures uncertainty and over-dispersion

One-Line Intuition for Students

The Beta–Binomial model treats probability itself as uncertain and learns it from data.

How Do We Get the Posterior $Beta (9, 5)$ ?

We are using the Beta–Binomial model.

Step 1: Write Down the Prior

The prior belief about the probability of success $p$ is:

p \sim \text{Beta}(\alpha,\beta)

In the example:

\boxed{ p \sim \text{Beta}(2,2) }

Interpretation:

$α - 1 = 1 prior success$
$\beta - 1 = 1$ prior failure

This reflects a mild belief that the coin is fair.

Step 2: Observe the Data

We perform 10 coin tosses and observe:

$k = 7$ successes (heads)
$n-k = 3$ failures (tails)

Step 3: Write the Likelihood

The Binomial likelihood is:

P(D \mid p) \propto p^k (1-p)^{n-k}

Substitute values:

P(D \mid p) \propto p^7 (1-p)^3

Step 4: Apply Bayes’ Theorem

P(p \mid D) \propto P(D \mid p)\,P(p)

Substitute the likelihood and the prior:

P(p \mid D) \propto \underbrace{p^7 (1-p)^3}_{\text{likelihood}} \cdot \underbrace{p^{2-1}(1-p)^{2-1}}_{\text{prior}}

Step 5: Combine the Exponents

P(p \mid D) \propto p^{7+1}(1-p)^{3+1} = p^{8}(1-p)^{4}

Step 6: Identify the Posterior Distribution

Recall the Beta distribution form:

\text{Beta}(\alpha',\beta') \propto p^{\alpha'-1}(1-p)^{\beta'-1}

Match exponents:

\alpha' - 1 = 8 \quad \Rightarrow \quad \alpha' = 9

\beta' - 1 = 4 \quad \Rightarrow \quad \beta' = 5

✅ Final Posterior

\boxed{ p \mid D \sim \text{Beta}(9,5) }

Step 7: The Simple Update Rule (Memorize This)

For Beta–Binomial:

\boxed{ \text{Posterior} = \text{Beta}(\alpha + k,\; \beta + n - k) }

Plug in:

\alpha + k = 2 + 7 = 9

\beta + (n-k) = 2 + 3 = 5

Intuitive Explanation (Very Important for Students)

Posterior = prior belief + observed evidence

Quantity	Meaning
$\alpha$	prior successes
$\beta$	prior failures
$k$	observed successes
$n - k$	observed failures

One-Line Summary for Exams

In the Beta–Binomial model, the posterior parameters are obtained by adding the number of observed successes and failures to the prior parameters.

How Do We Get the Expectation $\displaystyle \frac{9}{14}$ ?

We are computing the mean (expected value) of a Beta distribution.

Step 1: Recall the Posterior Distribution

From the previous step, we obtained:

$p \mid D \sim \text{Beta}(9,5)$

That means:

$α = 9$
$\beta = 5$

Step 2: Mean of a Beta Distribution

For a Beta random variable:

$\boxed{ \mathbb{E}[p] = \frac{\alpha}{\alpha + \beta} }$

This is a standard result that can be derived by integration.

Step 3: Substitute the Values

$\mathbb{E}[p] = \frac{9}{9 + 5} = \boxed{\frac{9}{14}}$

Step 4: Numerical Value

$\frac{9}{14} \approx 0.643$

Interpretation

This is the Bayesian estimate of the probability of success
It balances:
- Prior belief
- Observed data

Why This Makes Sense Intuitively

Recall the pseudo-count interpretation:

Quantity	Value
Prior successes	$\alpha - 1 = 8$
Prior failures	$\beta - 1 = 4$
Total pseudo-trials	$9 + 5 = 14$

So:

$\mathbb{E}[p] = \frac{\text{total successes}}{\text{total trials}}$

Compare with MLE

Method	Estimate
MLE	$7/10 = 0.7$
Bayesian mean	$9/14 \approx 0.643$

📌 Bayesian estimate is more conservative due to the prior.

Exam-Ready One-Liner

The expectation of a Beta(α,β) distribution is α/(α+β), obtained by integrating the density over [0,1].