How is PageRank Calculated?

This is where it gets tricky. The PR of each page depends on the PR of the pages pointing to it. But we won't know what PR those pages have until the pages pointing to them have their PR calculated and so on… And when you consider that page links can form circles it seems impossible to do this calculation!

But actually it's not that bad. Remember this bit of the Google paper:

    PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web.

What that means to us is that we can just go ahead and calculate a page's PR without knowing the final value of the PR of the other pages . That seems strange but, basically, each time we run the calculation we're getting a closer estimate of the final value. So all we need to do is remember the each value we calculate and repeat the calculations lots of times until the numbers stop changing much.

Lets take the simplest example network: two pages, each pointing to the other:

Each page has one outgoing link (the outgoing count is 1, i.e. C(A) = 1 and C(B) = 1).

Guess 1

We don't know what their PR should be to begin with, so let's take a guess at 1.0 and do some calculations:

d

= 0.85

PR(A)

= (1 – d) + d(PR(B)/1)

PR(B)

= (1 – d) + d(PR(A)/1)

i.e.

PR(A)

= 0.15 + 0.85 * 1
= 1

PR(B)

= 0.15 + 0.85 * 1
= 1

Hmm, the numbers aren't changing at all! So it looks like we started out with a lucky guess!!!

Guess 2

No, that's too easy, maybe I got it wrong (and it wouldn't be the first time). Ok, let's start the guess at 0 instead and re-calculate:

PR(A)

= 0.15 + 0.85 * 0
= 0.15

PR(B)

= 0.15 + 0.85 * 0.15
= 0.2775

NB. we've already calculated a “next best guess” at PR(A) so we use it here

And again:

PR(A)

= 0.15 + 0.85 * 0.2775
= 0.385875

PR(B)

= 0.15 + 0.85 * 0.385875
= 0.47799375

And again

PR(A)

= 0.15 + 0.85 * 0.47799375
= 0.5562946875

PR(B)

= 0.15 + 0.85 * 0.5562946875
= 0.622850484375

and so on. The numbers just keep going up. But will the numbers stop increasing when they get to 1.0? What if a calculation over-shoots and goes above 1.0?

Guess 3

Well let's see. Let's start the guess at 40 each and do a few cycles:

    PR(A) = 40
    PR(B) = 40

First calculation

PR(A)

= 0.15 + 0.85 * 40
= 34.25

PR(B)

= 0.15 + 0.85 * 0.385875
= 29.1775

And again

PR(A)

= 0.15 + 0.85 * 29.1775
= 24.950875

PR(B)

= 0.15 + 0.85 * 24.950875
= 21.35824375

Yup, those numbers are heading down alright! It sure looks the numbers will get to 1.0 and stop

  • Principle: it doesn't matter where you start your guess, once the PageRank calculations have settled down, the “ normalized probability distribution ” (the average PageRank for all pages) will be 1.0

Next...