Google Has the Most Dead Pages

Google is the leader is dead and old pages according to a study cited by Google Operating System.
Ziv Bar-Yossef, from Google, wrote a paper about sampling random pages from a search engine's index using queries. He explains some of the technical details in this video, including the utility of sampling random pages: comparing search engines, estimating the amount of spam, of fresh results etc.

He applied the results from his paper and compared Google, Yahoo and MSN Search. Here are three charts that show a comparison of the index size, how many dead pages are in each search engine and how fresh the results are. The charts are only an estimation, and they have a bias of around 10%. As you can see, Google doesn't do very well.
Ziv Bar-Yossef found that about 2% of Google's pages were dead pages compared to less than 1% for MSN and Yahoo. Sometimes dead pages can contain useful information -- such as the cached pages of a site that has been removed -- so it isn't always a bad thing to store dead pages. (via Search Engine Watch)

Posted on September 25, 2006





blog comments powered by Disqus