Beta Distribution

 

Beta Distribution — Detailed Explanation

1. What Is the Beta Distribution?

The Beta distribution is a continuous probability distribution defined on the interval:

0x10 \le x \le 1

It is used to model probabilities, proportions, and beliefs about quantities that lie between 0 and 1.

Typical Uses

  • Probability of success p in Bernoulli trials

  • Click-through rates

  • Bias of a coin

  • Conversion rates

  • Bayesian prior for binomial models


2. Why Is It Important?

In Bayesian statistics, the Beta distribution is the conjugate prior for:

  • Bernoulli distribution

  • Binomial distribution

This makes Bayesian updating analytically simple.


3. Probability Density Function (PDF)

A random variable XBeta(α,β) has the PDF:

f(x;α,β)=1B(α,β)xα1(1x)β1,0x1\boxed{ f(x;\alpha,\beta) = \frac{1}{B(\alpha,\beta)} x^{\alpha-1} (1-x)^{\beta-1}, \quad 0 \le x \le 1 }

Where:

  • α>0\alpha > 0: shape parameter

  • β>0: shape parameter

  • B(α,β)B(\alpha,\beta): Beta function


4. The Beta Function

B(α,β)=01xα1(1x)β1dxB(\alpha,\beta) = \int_0^1 x^{\alpha-1}(1-x)^{\beta-1}dx

It normalizes the distribution so total probability equals 1.

Relation to Gamma function:

B(α,β)=Γ(α)Γ(β)Γ(α+β)B(\alpha,\beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}

5. Interpretation of Parameters α and β

Intuitive Meaning

ParameterInterpretation
α1\alpha - 1
    Number of prior successes
β1\beta - 1
    Number of prior failures

📌 The Beta distribution encodes belief about probability.


6. Mean and Variance

Mean

E[X]=αα+β\boxed{ \mathbb{E}[X] = \frac{\alpha}{\alpha + \beta} }

Variance

Var(X)=αβ(α+β)2(α+β+1)\boxed{ \text{Var}(X) = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} }

7. Shapes of the Beta Distribution

α\alpha
β\beta
Shape
1       1    Uniform
>1    >1        Bell-shaped
<1    <1    U-shaped
>1    <1    Skewed right
<1    >1    Skewed left

8. Special Cases

Uniform Distribution

Beta(1,1)=Uniform(0,1)\text{Beta}(1,1) = \text{Uniform}(0,1)

Coin Toss Prior

Beta(1,1)no prior bias\text{Beta}(1,1) \Rightarrow \text{no prior bias}

9. Beta Distribution in Bayesian Inference

Bernoulli Likelihood

xiBernoulli(p)x_i \sim \text{Bernoulli}(p)

Prior

pBeta(α,β)p \sim \text{Beta}(\alpha,\beta)

Posterior

pDBeta(α+k,  β+nk)\boxed{ p \mid D \sim \text{Beta}(\alpha + k,\; \beta + n - k) }

Where:

  • kk = number of successes

  • nn = total trials

📌 This simple update is why Beta is so powerful.


10. Worked Example (Coin Toss)

Prior Belief

pBeta(2,2)p \sim \text{Beta}(2,2)

→ Slight belief that coin is fair


Observed Data

  • 7 heads

  • 3 tails


Posterior

pBeta(2+7,  2+3)=Beta(9,5)p \sim \text{Beta}(2+7,\; 2+3) = \text{Beta}(9,5)

Posterior Mean

E[p]=9140.64\mathbb{E}[p] = \frac{9}{14} \approx 0.64

11. MAP Estimate

pMAP=α1α+β2(for α,β>1)\boxed{ p_{\text{MAP}} = \frac{\alpha-1}{\alpha+\beta-2} \quad \text{(for } \alpha,\beta > 1) }

12. Connection to Machine Learning

Regularization

  • Beta prior ≈ regularization on probabilities

Logistic Regression (Bayesian)

  • Prior on class probability

A/B Testing

  • Posterior over conversion rate


13. Comparison with Gaussian Distribution

FeatureBetaGaussian
Support        [0,1]                            (,)(-\infty,\infty)
Used for        Probabilities        Real values
Conjugate to        Bernoulli/Binomial        Gaussian

14. Key Takeaways (Exam-Friendly)

  • Beta distribution models uncertainty over probabilities

  • Defined on [0,1]

  • Parameters act like pseudo-counts

  • Conjugate prior for Bernoulli/binomial models

  • Central to Bayesian learning


One-Line Intuition for Students

The Beta distribution represents our belief about a probability before and after seeing data

Comments

Popular posts from this blog

Advanced Mathematics for Computer Science HNCST409 KTU BTech Honors 2024 Scheme

AKS Primality Testing

Galois Field and Operations