Getting the answer quicker

How many times do we need to repeat the calculation for big networks? That's a difficult question; for a network as large as the World Wide Web it can be many millions of iterations! The “damping factor” is quite subtle. If it's too high then it takes ages for the numbers to settle, if it's too low then you get repeated over-shoot, both above and below the average - the numbers just swing about the average like a pendulum and never settle down.

Also choosing the order of calculations can help. The answer will always come out the same no matter which order you choose, but some orders will get you there quicker than others.

I'm sure there's been several Master's Thesis on how to make this calculation as efficient as possible, but, in the examples below, I've used very simple code for clarity and roughly 20 to 40 iterations were needed!

Example 1

I'm not going to repeat the calculations here, but you can see them by running the program (yes, if you click the link the program really is re-run to do the calculations for you)

So the correct PR for the example is:

You can see it took about 20 iterations before the network began to settle on these values!

Look at Page D though - it has a PR of 0.15 even though no-one is voting for it (i.e. it has no incoming links)! Is this right?

The first part, or "term" to be techinal, of the PR equation is doing this:

    PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

So, for Page D, no backlinks means the equation looks like this:

PR(A)

= (1-d) + d * (0)
= 0.15

no matter what else is going on or how many times you do it.

    Observation: every page has at least a PR of 0.15 to share out. But this may only be in theory - there are rumours that Google undergoes a post-spidering phase whereby any pages that have no incoming links at all are completely deleted from the index...

Example 2

A simple hierarchy with some outgoing links

     

As you'd expect, the home page has the most PR – after all, it has the most incoming links! But what's happened to the average? It's only 0.378!!! That doesn't tie up with what I said earlier so something is wrong somewhere!

Well no, everything is fine. But take a look at the “external site” pages – what's happening to their PageRank? They're not passing it on, they're not voting for anyone, they're wasting their PR like so much pregnant chad!!! (NB, a more accurate description of this issue can be found in this thread )

Example 3

Let's link those external sites back into our home page just so we can see what happens to the average…

That's better - it does work after all! And look at the PR of our home page! All those incoming links sure make a difference – we'll talk more about that later.

Example 4

What happens to PR if we follow a suggestion about writing page reviews?

Example 5

A simple hierarchy

Our home page has 2 and a half times as much PR as the child pages! Excellent!

  • Observation : a hierarchy concentrates votes and PR into one page

Example 6

Looping

This is what we'd expect. All the pages have the same number of incoming links, all pages are of equal importance to each other, all pages get the same PR of 1.0 (i.e. the “average” probability).

Example 7

Extensive Interlinking – or Fully Meshed

Yes, the results are the same as the Looping example above and for the same reasons.

Example 8

Hierarchical – but with a link in and one out.

We'll assume there's an external site that has lots of pages and links with the result that one of the pages has the average PR of 1.0. We'll also assume the webmaster really likes us – there's just one link from that page and it's pointing at our home page.

In example 5 the home page only had a PR of 1.92 but now it is 3.31! Excellent! Not only has site A contributed 0.85 PR to us, but the raised PR in the “About”, “Product” and “More” pages has had a lovely “feedback” effect, pushing up the home page's PR even further!

  • Priciple: a well structured site will amplify the effect of any contributed PR

Example 9

Looping – but with a link in and a link out

Well, the PR of our home page has gone up a little, but what's happened to the “More” page?

The vote of the “Product” page has been split evenly between it and the external site. We now value the external Site B equally with our “More” page. The “More” page is getting only half the vote it had before – this is good for Site B but very bad for us!

Example 10

Fully meshed – but with one vote in and one vote out

That's much better. The “More” page is still getting less share of the vote than in example 7 of course, but now the “Product” page has kept three quarters of its vote within our site - unlike example 10 where it was giving away fully half of it's vote to the external site!

Keeping just this small extra fraction of the vote within our site has had a very nice effect on the Home Page too – PR of 2.28 compared with just 1.66 in example 10.

  • Observation: increasing the internal links in your site can minimise the damage to your PR when you give away votes by linking to external sites.
  • Principle:
    • If a particular page is highly important – use a hierarchical structure with the important page at the “top”.
    • Where a group of pages may contain outward links – increase the number of internal links to retain as much PR as possible.
    • Where a group of pages do not contain outward links – the number of internal links in the site has no effect on the site's average PR. You might as well use a link structure that gives the user the best navigational experience.

Site Maps

Site maps are useful in at least two ways:

  • If a user types in a bad URL most websites return a really unhelpful “404 – page not found” error page. This can be discouraging. Why not configure your server to return a page that shows an error has been made, but also gives the site map? This can help the user enormously
  • Linking to a site map on each page increases the number of internal links in the site, spreading the PR out and protecting you against your vote “donations”

Example 11

Lets try to fix our site to artificially concentrate the PR into the home page.

That looks good, most of the links seem to be pointing up to page A so we should get a nice PR.

Try to guess what the PR of A will be before you scroll down or run the code.

Oh dear, that didn't work at all well – it's much worse than just an ordinary hierarchy! What's going on is that pages C and D have such weak incoming links that they're no help to page A at all!

  • Principle : trying to abuse the PR calculation is harder than you think.

Example 12

A common web layout for long documentation is to split the document into many pages with a “Previous” and “Next” link on each plus a link back to the home page. The home page then only needs to point to the first page of the document.

In this simple example, where there's only one document, the first page of the document has a higher PR than the Home Page! This is because page B is getting all the vote from page A, but page A is only getting fractions of pages B, C and D.

  • Principle : in order to give users of your site a good experience, you may have to take a hit against your PR. There's nothing you can do about this - and neither should you try to or worry about it! If your site is a pleasure to use lots of other webmasters will link to it and you'll get back much more PR than you lost.

Can you also see the trend between this and the previous example? As you add more internal links to a site it gets closer to the Fully Meshed example where every page gets the average PR for the mesh.

  • Observation : as you add more internal links in your site, the PR will be spread out more evenly between the pages.

Example 13

Getting high PR the wrong way and the right way.

Just as an experiment, let's see if we can get 1,000 pages pointing to our home page, but only have one link leaving it…

Yup, those spam pages are pretty worthless but they sure add up!

  • Observation : it doesn't matter how many pages you have in your site, your average PR will always be 1.0 at best. But a hierarchical layout can strongly concentrate votes, and therefore the PR, into the home page!

This is a technique used by some disreputable sites (mostly adult content sites). But I can't advise this - if Google's robots decide you're doing this there's a good chance you'll be banned from Google! Disaster!

Next....