Page Rank Algorithm

 

Page Rank Algorithm

PageRank is an algorithm originally developed at Google to rank web pages in search results.

It measures the importance of a page based on the links pointing to it.


Core Idea (Simple Intuition)

A webpage is important if:

  • many pages link to it, and

  • those linking pages are themselves important.

👉 This creates a recursive importance score.


Everyday analogy

Think of academic papers:

  • A paper cited by many important papers is influential.

  • Not all citations count equally.

PageRank works the same way for web pages.


 Random Surfer Model (Most intuitive view)

Imagine a user:

  1. Starts on a random webpage

  2. Clicks a random link on the page

  3. Repeats forever

Sometimes the user:

  • gets bored and jumps to a random page instead

The PageRank of a page = probability the surfer is on that page after a long time.


 Mathematical Formulation

Suppose there are NN pages.

PageRank of page ii:

PR(i)=1dN+djMiPR(j)L(j)PR(i)=\frac{1-d}{N}+d\sum_{j\in M_i}\frac{PR(j)}{L(j)}

Meaning of symbols

SymbolMeaning
dd
        damping factor (usually 0.85)
MiM_i        pages linking to page ii
L(j)L(j)        number of outgoing links from page jj
NN        total number of pages

Interpretation

  • (1d)/N(1-d)/N → random jump probability

  • second term → importance passed through links


Matrix Form (Markov Chain View)

Let:

PR=dPTPR+1dN1\mathbf{PR}=dP^T\mathbf{PR}+\frac{1-d}{N}\mathbf{1}

where:

  • PP = transition probability matrix

  • PageRank = stationary distribution of a Markov chain

This is why PageRank connects to:

  • Markov chains

  • eigenvectors

  • steady-state probability


 Simple Example

Link structure

  • A → B, C

  • B → C

  • C → A

Initially:

PR(A)=PR(B)=PR(C)=1/3PR(A)=PR(B)=PR(C)=1/3

After iterations:

  • C receives links from A and B → score increases

  • A receives from C → moderate

  • B receives only from A → smaller

Eventually values stabilize.

This stable vector = PageRank.


Algorithm Steps

  1. Initialize ranks equally:

PRi=1/NPR_i=1/N
  1. Update ranks using formula

  2. Repeat until convergence

  3. Output final ranks

This is essentially power iteration.


 Why PageRank Works

Without PageRank:

  • pages with many links dominate (even spam)

With PageRank:

  • links from important pages count more

  • prevents simple manipulation

It measures quality of links, not just quantity.


Applications of PageRank

🌍 1. Web Search Ranking

Original use:

  • rank webpages by importance

  • combined with content relevance


📱 2. Social Network Analysis

Used to find:

  • influential users on Twitter/X

  • important Instagram accounts

  • key nodes in communication networks


📚 3. Citation Analysis

Used in research:

  • rank scientific papers

  • find influential authors

  • journal impact measurement


🧬 4. Biology & Medicine

Applied to:

  • protein interaction networks

  • gene importance ranking

  • disease propagation modeling


🛒 5. Recommendation Systems

Variants used for:

  • product recommendation

  • movie ranking

  • content importance scoring


🚦 6. Network Reliability & Traffic Flow

Helps identify:

  • critical routers in internet

  • important intersections in road networks


 Strengths

✔ Handles huge networks
✔ mathematically elegant (Markov chain steady state)
✔ resistant to simple manipulation
✔ scalable with iterative methods


🔟 Limitations

✖ ignores page content quality
✖ susceptible to link farms (partially mitigated)
✖ slow convergence for massive graphs
✖ modern search engines use many additional signals


 Summary 

PageRank is a link-analysis algorithm that assigns importance to nodes in a network using the stationary distribution of a Markov chain based on link structure.

PageRank models web navigation as a Markov process and assigns importance scores based on the steady-state visit probability, making it a powerful method for ranking nodes in large networks. 

✅ Python Code — Basic PageRank (Iterative Method)

This example computes PageRank for a small web graph.

📌 Graph structure

  • A → B, C

  • B → C

  • C → A

import numpy as np
import matplotlib.pyplot as plt

# ----- Transition matrix -----

# Order: A, B, C

P = np.array([
    [0,   0,   1],    # A receives from C
    [1/2, 0,   0],    # B receives from A
    [1/2, 1,   0]     # C receives from A and B
], dtype=float)

# ----- Parameters -----
N = P.shape[0]
d = 0.85                     # damping factor
iterations = 50

# ----- Initialize ranks equally -----
rank = np.ones(N) / N

# ----- Power iteration -----
for _ in range(iterations):
    rank = d * P @ rank + (1-d)/N

# ----- Print results -----
pages = ['A','B','C']
for p, r in zip(pages, rank):
    print(f"Page {p} rank: {r:.4f}")


pages = ['A','B','C']
plt.bar(pages, rank)
plt.title("PageRank Scores")
plt.ylabel("Rank value")
plt.show()

OUTPUT

Page A rank: 0.3878 Page B rank: 0.2148 Page C rank: 0.3974



Comments

Popular posts from this blog

Advanced Mathematics for Computer Science HNCST409 KTU BTech Honors 2024 Scheme

Convex and Non Convex Sets

Gradient-Based Methods for Optimization-Gradient Descent Algorithm