Bayesian Paradigm: Probability as Belief

 

Bayesian Paradigm: Probability as Belief


1. Two Views of Probability (Context)

Before Bayesian thinking, students usually see the frequentist view.

Frequentist View

  • Probability = long-run relative frequency

  • Example:
    “Probability of heads = 0.5” means in many coin tosses, half are heads.

Bayesian View

  • Probability is a degree of belief or certainty about an event.

  • Beliefs are updated when new evidence arrives.

📌 Key idea: Probability quantifies uncertainty, not just frequency.


2. Probability as Belief

In the Bayesian paradigm:

Probability represents how strongly we believe a statement is true, given the available information.

Example

  • “It will rain tomorrow”

  • No repeated experiment

  • Yet we say: “70% chance of rain”

➡ This is belief-based probability.


3. Prior Belief (Before Seeing Data)

A prior probability represents belief before observing data.

Example:

  • Belief that a coin is fair

  • Prior:

    P(Heads)=0.5P(\text{Heads}) = 0.5

In ML:

  • Prior belief about model parameters

P(θ)P(\theta)

4. Evidence (Observed Data)

Data provides evidence.

Example:

  • Coin tossed 10 times → 8 heads

  • Data challenges prior belief

In ML:

  • Dataset D


5. Likelihood (How Data Supports Beliefs)

The likelihood measures how likely the observed data is, given a hypothesis.

P(Dθ)P(D \mid \theta)

Example:

  • How likely are 8 heads if the coin bias is 0.5?

  • How likely if bias is 0.8?


6. Posterior Belief (Updated Belief)

Bayesian inference updates belief using Bayes’ theorem:

P(θD)=P(Dθ)P(θ)P(D)\boxed{ P(\theta \mid D) = \frac{P(D \mid \theta) P(\theta)}{P(D)} }

Where:

  • P(θ)P(\theta) → Prior

  • P(Dθ)P(D \mid \theta)→ Likelihood

  • P(θD)P(\theta \mid D)→ Posterior (updated belief)

  • P(D) → Evidence (normalizing constant)

📌 Posterior combines prior belief + data.


7. Simple Example (Coin Toss)

Prior

  • Believe coin is fair:

P(θ=0.5)

Data

  • 8 heads out of 10 tosses

Posterior

  • Belief shifts toward a biased coin

  • But prior prevents extreme conclusions from small data

➡ This is rational belief updating.


8. Bayesian Paradigm in Machine Learning

Parameter Estimation

  • Treat parameters as random variables

θP(θ)\theta \sim P(\theta)
  • Goal:

P(θD)P(\theta \mid D)

Prediction

P(yx,D)=P(yx,θ)P(θD)dθP(y^* \mid x^*, D) = \int P(y^* \mid x^*, \theta) P(\theta \mid D) d\theta

📌 Accounts for uncertainty in parameters.


9. Bayesian vs Frequentist (Quick Contrast)

Aspect    FrequentistBayesian
Parameters    Fixed                Random
Probability    Frequency                Belief
Uncertainty    From data                From belief + data
Prior knowledge    Not used                Explicitly used

10. Why Bayesian Thinking is Powerful

✔ Works with small data
✔ Incorporates prior knowledge
✔ Quantifies uncertainty
✔ Naturally handles learning over time


11. One-Line Intuition for Students

Bayesian probability measures how strongly we believe something is true and updates that belief when new evidence arrives.


12. Exam-Ready Definition

The Bayesian paradigm interprets probability as a degree of belief and uses Bayes’ theorem to update beliefs in the presence of new data.


 

Bayesian Paradigm: Worked Example

Coin Toss Experiment (Probability as Belief)


Problem Statement

We want to estimate the bias of a coin, i.e., the probability θ\theta of getting Heads, using the Bayesian approach.


Step 1: Unknown Quantity (What Are We Learning?)

Let

θ=P(Heads)\theta = P(\text{Heads})

In the Bayesian paradigm, θ\theta is treated as a random variable, not a fixed unknown constant.


Step 2: Prior Belief

Before observing any data, suppose we believe the coin is fair.

We encode this belief using a prior distribution.

Choose a Prior

A common prior for probabilities is the Beta distribution:

θBeta(α,β)\theta \sim \text{Beta}(\alpha, \beta)

Let:

α=2,β=2\alpha = 2,\quad \beta = 2

This prior:

  • Is symmetric around 0.5

  • Reflects belief that the coin is roughly fair

📌 Prior mean:

E[θ]=αα+β=24=0.5E[\theta] = \frac{\alpha}{\alpha + \beta} = \frac{2}{4} = 0.5

Step 3: Observe Data (Evidence)

We toss the coin 10 times and observe:

  • Heads = 8

  • Tails = 2

This is our data DD.


Step 4: Likelihood Function

The likelihood of observing 8 heads and 2 tails given θ\theta is:

P(Dθ)=θ8(1θ)2P(D \mid \theta) = \theta^8 (1-\theta)^2

This tells us how well each value of θ\theta explains the data.


Step 5: Bayes’ Theorem (Belief Update)

Bayes’ theorem:

P(θD)P(Dθ)P(θ)P(\theta \mid D) \propto P(D \mid \theta) P(\theta)

Substitute:

  • Prior: Beta(2,2)

  • Likelihood: θ8(1θ)2\theta^8 (1-\theta)^2


Step 6: Posterior Distribution

For a Beta prior and binomial likelihood, the posterior is also Beta:

θDBeta(α+Heads,β+Tails)\theta \mid D \sim \text{Beta}(\alpha + \text{Heads}, \beta + \text{Tails})

So:

θDBeta(2+8,  2+2)=Beta(10,4)\theta \mid D \sim \text{Beta}(2+8,\; 2+2) = \text{Beta}(10,4)

Step 7: Interpret the Posterior (Updated Belief)

Posterior Mean

E[θD]=1010+4=0.714E[\theta \mid D] = \frac{10}{10+4} = 0.714

📌 Our belief has shifted from 0.5 → 0.714, but not fully to 0.8, because the prior still influences the result.


Posterior Variance

  • Smaller than prior variance

  • Indicates increased confidence


Step 8: MAP Estimate (Most Probable Value)

The Maximum A Posteriori (MAP) estimate is:

θMAP=α1α+β2\theta_{\text{MAP}} = \frac{\alpha - 1}{\alpha + \beta - 2}

For Beta(10,4)\text{Beta}(10,4):

θMAP=912=0.75\theta_{\text{MAP}} = \frac{9}{12} = 0.75

Step 9: Bayesian Interpretation (Key Insight)

Stage            Belief About Coin
Before data            Coin is fair
After data            Coin is biased
Confidence            Moderate (only 10 tosses)

📌 Bayesian learning balances prior belief and observed evidence.


Step 10: Comparison with Frequentist Estimate

Frequentist MLE:

θ^MLE=810=0.8\hat{\theta}_{\text{MLE}} = \frac{8}{10} = 0.8

Bayesian estimate:

E[θD]=0.714E[\theta \mid D] = 0.714

➡ Bayesian estimate is more conservative, especially with small data.


Step 11: Why This Shows “Probability as Belief”

  • We were uncertain about θ\theta

  • We expressed uncertainty using a probability distribution

  • We updated beliefs using Bayes’ rule

  • Probability represents belief, not frequency


Exam-Ready Conclusion

In the Bayesian paradigm, probability represents a degree of belief, which is updated using Bayes’ theorem when new data is observed, as illustrated by estimating a coin’s bias using a prior and posterior distribution.


One-Line Intuition for Students

Bayesian inference updates what we believe as evidence accumulates.

 

Comments

Popular posts from this blog

Advanced Mathematics for Computer Science HNCST409 KTU BTech Honors 2024 Scheme

AKS Primality Testing

Galois Field and Operations