Cross-Entropy

 

๐Ÿ“˜ Cross-Entropy 

What is Cross-Entropy?

Cross-entropy measures how well a predicted probability distribution matches the true distribution.

  • Small value → prediction is good

  • Large value → prediction is poor

๐Ÿ‘‰ It is widely used as a loss function in machine learning (classification, neural networks, logistic regression).


Mathematical Definition

For true distribution PP and predicted distribution QQ:

H(P,Q)=iP(i)log2Q(i)H(P, Q) = -\sum_i P(i)\log_2 Q(i)
  • P(i)P(i)= true probability

  • Q(i)Q(i)= predicted probability


Intuition

Think of cross-entropy as:

“How surprised are we if predictions QQ are used while reality is PP?”

  • If prediction assigns high probability to the correct class → low surprise → low loss

  • If prediction assigns low probability to correct class → high surprise → high loss


Simple Coin Example

✅ True distribution (actual)

P(H)=1,P(T)=0P(H)=1,\quad P(T)=0

(The coin actually came up Heads)


✅ Case 1 — Good prediction

Q(H)=0.9,Q(T)=0.1Q(H)=0.9,\quad Q(T)=0.1
H(P,Q)=(1log20.9+0log20.1)H(P,Q) = -(1\log_2 0.9 + 0\log_2 0.1)
=log2(0.9)0.15= -\log_2(0.9) \approx 0.15

✔ Small cross-entropy → good prediction


❌ Case 2 — Bad prediction

Q(H)=0.2,Q(T)=0.8Q(H)=0.2,\quad Q(T)=0.8
H(P,Q)=log2(0.2)2.32H(P,Q) = -\log_2(0.2) \approx 2.32

❌ Large cross-entropy → poor prediction


Classification Example (Machine Learning)

Suppose a model predicts probabilities for 3 classes:

Class    True P    Predicted Q
Cat    1        0.7
Dog    0        0.2
Rabbit    0        0.1
H=(1log20.7)0.51H = -(1\log_2 0.7) \approx 0.51

If the model predicted:

| Cat | 0.1 |

H=log2(0.1)3.32H = -\log_2(0.1) \approx 3.32

๐Ÿ‘‰ Much worse prediction → larger loss.




Python Code Example

import numpy as np def cross_entropy(P, Q): P = np.array(P) Q = np.array(Q) return -np.sum(P * np.log2(Q + 1e-12)) # avoid log(0) # Example 1 (good prediction) P = [1,0] Q1 = [0.9,0.1] # Example 2 (bad prediction) Q2 = [0.2,0.8] print("Cross entropy (good):", cross_entropy(P,Q1)) print("Cross entropy (bad):", cross_entropy(P,Q2))

Why Cross-Entropy is Used in ML


✔ Smooth and differentiable → good for gradient descent
✔ Penalizes confident wrong predictions heavily
✔ Works naturally with probability outputs (softmax/sigmoid)

Summary

Cross-entropy = penalty for predicting the wrong probabilities.

Comments

Popular posts from this blog

Advanced Mathematics for Computer Science HNCST409 KTU BTech Honors 2024 Scheme

Convex and Non Convex Sets

Gradient-Based Methods for Optimization-Gradient Descent Algorithm