Bayesian Paradigm: Probability as Belief

1. Two Views of Probability (Context)

Before Bayesian thinking, students usually see the frequentist view.

Frequentist View

Probability = long-run relative frequency
Example:
“Probability of heads = 0.5” means in many coin tosses, half are heads.

Bayesian View

Probability is a degree of belief or certainty about an event.
Beliefs are updated when new evidence arrives.

📌 Key idea: Probability quantifies uncertainty, not just frequency.

2. Probability as Belief

In the Bayesian paradigm:

Probability represents how strongly we believe a statement is true, given the available information.

Example

“It will rain tomorrow”
No repeated experiment
Yet we say: “70% chance of rain”

➡ This is belief-based probability.

3. Prior Belief (Before Seeing Data)

A prior probability represents belief before observing data.

Example:

Belief that a coin is fair
Prior:
$P(\text{Heads}) = 0.5$

In ML:

Prior belief about model parameters

P(\theta)

4. Evidence (Observed Data)

Data provides evidence.

Example:

Coin tossed 10 times → 8 heads
Data challenges prior belief

In ML:

Dataset $D$

5. Likelihood (How Data Supports Beliefs)

The likelihood measures how likely the observed data is, given a hypothesis.

P(D \mid \theta)

Example:

How likely are 8 heads if the coin bias is 0.5?
How likely if bias is 0.8?

6. Posterior Belief (Updated Belief)

Bayesian inference updates belief using Bayes’ theorem:

\boxed{ P(\theta \mid D) = \frac{P(D \mid \theta) P(\theta)}{P(D)} }

Where:

$P(\theta)$ → Prior
$P(D \mid \theta)$ → Likelihood
$P(\theta \mid D)$ → Posterior (updated belief)
$P (D)$ → Evidence (normalizing constant)

📌 Posterior combines prior belief + data.

7. Simple Example (Coin Toss)

Prior

Believe coin is fair:

P (θ = 0.5)

Data

8 heads out of 10 tosses

Posterior

Belief shifts toward a biased coin
But prior prevents extreme conclusions from small data

➡ This is rational belief updating.

8. Bayesian Paradigm in Machine Learning

Parameter Estimation

Treat parameters as random variables

\theta \sim P(\theta)

Goal:

P(\theta \mid D)

Prediction

P(y^* \mid x^*, D) = \int P(y^* \mid x^*, \theta) P(\theta \mid D) d\theta

📌 Accounts for uncertainty in parameters.

9. Bayesian vs Frequentist (Quick Contrast)

Aspect	Frequentist	Bayesian
Parameters	Fixed	Random
Probability	Frequency	Belief
Uncertainty	From data	From belief + data
Prior knowledge	Not used	Explicitly used

10. Why Bayesian Thinking is Powerful

✔ Works with small data
✔ Incorporates prior knowledge
✔ Quantifies uncertainty
✔ Naturally handles learning over time

11. One-Line Intuition for Students

Bayesian probability measures how strongly we believe something is true and updates that belief when new evidence arrives.

12. Exam-Ready Definition

The Bayesian paradigm interprets probability as a degree of belief and uses Bayes’ theorem to update beliefs in the presence of new data.

Bayesian Paradigm: Worked Example

Coin Toss Experiment (Probability as Belief)

Problem Statement

We want to estimate the bias of a coin, i.e., the probability $\theta$ of getting Heads, using the Bayesian approach.

Step 1: Unknown Quantity (What Are We Learning?)

Let

\theta = P(\text{Heads})

In the Bayesian paradigm, $\theta$ is treated as a random variable, not a fixed unknown constant.

Step 2: Prior Belief

Before observing any data, suppose we believe the coin is fair.

We encode this belief using a prior distribution.

Choose a Prior

A common prior for probabilities is the Beta distribution:

\theta \sim \text{Beta}(\alpha, \beta)

Let:

\alpha = 2,\quad \beta = 2

This prior:

Is symmetric around 0.5
Reflects belief that the coin is roughly fair

📌 Prior mean:

E[\theta] = \frac{\alpha}{\alpha + \beta} = \frac{2}{4} = 0.5

Step 3: Observe Data (Evidence)

We toss the coin 10 times and observe:

Heads = 8
Tails = 2

This is our data $D$ .

Step 4: Likelihood Function

The likelihood of observing 8 heads and 2 tails given $\theta$ is:

P(D \mid \theta) = \theta^8 (1-\theta)^2

This tells us how well each value of $\theta$ explains the data.

Step 5: Bayes’ Theorem (Belief Update)

Bayes’ theorem:

P(\theta \mid D) \propto P(D \mid \theta) P(\theta)

Substitute:

Prior: $Beta (2, 2)$
Likelihood: $\theta^8 (1-\theta)^2$

Step 6: Posterior Distribution

For a Beta prior and binomial likelihood, the posterior is also Beta:

\theta \mid D \sim \text{Beta}(\alpha + \text{Heads}, \beta + \text{Tails})

So:

\theta \mid D \sim \text{Beta}(2+8,\; 2+2) = \text{Beta}(10,4)

Step 7: Interpret the Posterior (Updated Belief)

Posterior Mean

E[\theta \mid D] = \frac{10}{10+4} = 0.714

📌 Our belief has shifted from 0.5 → 0.714, but not fully to 0.8, because the prior still influences the result.

Posterior Variance

Smaller than prior variance
Indicates increased confidence

Step 8: MAP Estimate (Most Probable Value)

The Maximum A Posteriori (MAP) estimate is:

\theta_{\text{MAP}} = \frac{\alpha - 1}{\alpha + \beta - 2}

For $\text{Beta}(10,4)$ :

\theta_{\text{MAP}} = \frac{9}{12} = 0.75

Step 9: Bayesian Interpretation (Key Insight)

Stage	Belief About Coin
Before data	Coin is fair
After data	Coin is biased
Confidence	Moderate (only 10 tosses)

📌 Bayesian learning balances prior belief and observed evidence.

Step 10: Comparison with Frequentist Estimate

Frequentist MLE:

\hat{\theta}_{\text{MLE}} = \frac{8}{10} = 0.8

Bayesian estimate:

E[\theta \mid D] = 0.714

➡ Bayesian estimate is more conservative, especially with small data.

Step 11: Why This Shows “Probability as Belief”

We were uncertain about $\theta$
We expressed uncertainty using a probability distribution
We updated beliefs using Bayes’ rule
Probability represents belief, not frequency

Exam-Ready Conclusion

In the Bayesian paradigm, probability represents a degree of belief, which is updated using Bayes’ theorem when new data is observed, as illustrated by estimating a coin’s bias using a prior and posterior distribution.

One-Line Intuition for Students

Bayesian inference updates what we believe as evidence accumulates.