Information Theory & Entropy

 

Information Theory & Entropy 

 What is Information Theory?

Information theory studies:

  • how information is measured

  • how much uncertainty exists in data

  • how efficiently information can be encoded or transmitted

Introduced by Claude Shannon, it forms the basis of:

  • Data compression (ZIP, JPEG)

  • Machine learning loss functions

  • Communication systems

  • Cryptography


What is Entropy?

✅ Definition

Entropy measures the uncertainty or randomness in a probability distribution.

  • High entropy → very unpredictable outcome

  • Low entropy → predictable outcome

πŸ“ Formula

For probabilities P={p1,p2,...,pn}P = \{p_1, p_2, ..., p_n\}

H(P)=i=1npilog2piH(P) = -\sum_{i=1}^{n} p_i \log_2 p_i

Unit = bits


Intuition

SituationEntropy
Fair coin        High uncertainty → High entropy
Biased coin (always heads)        No uncertainty → Entropy = 0
Uniform distribution        Maximum entropy

Example Calculations

🎲 Example 1 — Fair Coin

P(H)=0.5,  P(T)=0.5P(H)=0.5,\; P(T)=0.5
H=[0.5log20.5+0.5log20.5]=1 bitH = -[0.5\log_2 0.5 + 0.5\log_2 0.5] = 1 \text{ bit}

πŸ‘‰ Interpretation: Need 1 bit to encode outcome.


🎲 Example 2 — Biased Coin

P(H)=0.9,  P(T)=0.1P(H)=0.9,\; P(T)=0.1
H=(0.9log20.9+0.1log20.1)0.47 bitsH = -(0.9\log_2 0.9 + 0.1\log_2 0.1) \approx 0.47 \text{ bits}

πŸ‘‰ Less uncertainty → lower entropy.


🎲 Example 3 — Deterministic Outcome

P(H)=1,  P(T)=0P(H)=1,\; P(T)=0
H=0H = 0

πŸ‘‰ No uncertainty → zero information needed.


Significance of Entropy

πŸ“Œ In Machine Learning

  • Measures impurity in decision trees

  • Used in information gain

  • Basis for cross-entropy loss

πŸ“Œ In Communication

  • Minimum number of bits needed to encode message

  • Guides optimal compression schemes

πŸ“Œ In Probability & Statistics

  • Quantifies unpredictability

  • Helps compare distributions


Python Code to Compute Entropy

✅ Basic Function

import numpy as np def entropy(prob): prob = np.array(prob) prob = prob[prob > 0] # remove zero probabilities return -np.sum(prob * np.log2(prob)) # examples P1 = [0.5, 0.5] P2 = [0.9, 0.1] P3 = [0.6, 0.3, 0.1] print("Entropy of fair coin:", entropy(P1)) print("Entropy of biased coin:", entropy(P2)) print("Entropy of [0.6,0.3,0.1]:", entropy(P3))

Visualization Code

import numpy as np import matplotlib.pyplot as plt p = np.linspace(0.001, 0.999, 100) H = -(p*np.log2(p) + (1-p)*np.log2(1-p)) plt.plot(p, H) plt.xlabel("P(Heads)") plt.ylabel("Entropy (bits)") plt.title("Entropy of a Coin vs Probability") plt.show()

πŸ‘‰ Shows entropy is maximum at p = 0.5




Summary

Entropy = amount of surprise in the outcome.

  • Predictable event → low entropy

  • Random event → high entropy


Comments

Popular posts from this blog

Advanced Mathematics for Computer Science HNCST409 KTU BTech Honors 2024 Scheme

Convex and Non Convex Sets

Gradient-Based Methods for Optimization-Gradient Descent Algorithm