Bayesian inference for a coin toss (Beta prior + Binomial likelihood)

 

Bayesian inference for a coin toss (Beta prior + Binomial likelihood)

We want to estimate the probability of heads, call it θ, using observed coin tosses.

Bayesian inference updates our prior belief about θ using observed data to get a posterior belief.


1️⃣ Model setup

✔ Likelihood: Binomial

If we toss the coin n times and see k heads, then

P(kθ)=(nk)θk(1θ)nkP(k \mid \theta) = \binom{n}{k}\theta^k(1-\theta)^{n-k}

This says: given θ, what's the chance of seeing k heads?


✔ Prior: Beta distribution

We assume

θBeta(α,β)\theta \sim \text{Beta}(\alpha,\beta)

Beta is ideal because:

  • Defined on [0,1]

  • Flexible shape

  • Conjugate to Binomial → posterior is also Beta

Prior density:

P(θ)θα1(1θ)β1P(\theta) \propto \theta^{\alpha-1}(1-\theta)^{\beta-1}

Interpretation:

  • α − 1 = prior pseudo-heads

  • β − 1 = prior pseudo-tails


2️⃣ Posterior update

Using Bayes’ rule:

P(θk)P(kθ)P(θ)P(\theta \mid k) \propto P(k \mid \theta)P(\theta)

Multiply likelihood and prior:

θk(1θ)nkθα1(1θ)β1\theta^k(1-\theta)^{n-k} \cdot \theta^{\alpha-1}(1-\theta)^{\beta-1}
=θk+α1(1θ)nk+β1= \theta^{k+\alpha-1}(1-\theta)^{n-k+\beta-1}

So posterior is:

θkBeta(α+k,  β+nk)\boxed{\theta \mid k \sim \text{Beta}(\alpha+k,\;\beta+n-k)}

This is the key result.


🎯 Example 1 — Neutral prior

Prior

Beta(1,1)\text{Beta}(1,1)

Uniform → no preference

Data

10 tosses → 7 heads, 3 tails

Posterior

Beta(1+7,  1+3)=Beta(8,4)\text{Beta}(1+7,\;1+3) = \text{Beta}(8,4)

Posterior mean

E[θ]=88+4=0.667E[\theta] = \frac{8}{8+4} = 0.667

Observed frequency = 0.7
Bayesian estimate slightly shrinks toward 0.5


🎯 Example 2 — Strong prior belief coin is fair

Prior

Beta(50,50)\text{Beta}(50,50)

Mean = 0.5 with high confidence

Data

10 tosses → 7 heads

Posterior

Beta(57,53)\text{Beta}(57,53)

Posterior mean:

57110=0.518\frac{57}{110} = 0.518

Despite 70% heads, estimate stays near 0.5
because prior is strong.


🎯 Example 3 — Prior belief coin biased to heads

Prior

Beta(8,2)\text{Beta}(8,2)

Mean = 0.8

Data

10 tosses → 3 heads

Posterior

Beta(11,9)\text{Beta}(11,9)

Posterior mean:

1120=0.55\frac{11}{20}=0.55

Even though data suggests 0.3, posterior balances prior + data.


🧠 Intuition (most important)

Bayesian updating works like:

Source        Heads    Tails
Prior contributes        α−1        β−1
Data contributes        k        n−k
Posterior totals        α+k        β+n−k

So the prior behaves like imaginary observations.


📊 Predictive probability (next toss)

Chance next toss is heads:

P(heads next)=α+kα+β+nP(\text{heads next}) = \frac{\alpha+k}{\alpha+\beta+n}

Example from Example 1:

812=0.667\frac{8}{12}=0.667

🧩 Why Beta is conjugate

Because:

Beta×BinomialBeta\text{Beta} \times \text{Binomial} \rightarrow \text{Beta}

Same functional form after updating → closed-form solution.

No numerical integration needed.

Comments

Popular posts from this blog

Advanced Mathematics for Computer Science HNCST409 KTU BTech Honors 2024 Scheme

AKS Primality Testing

Galois Field and Operations