Google Has the Most Dead Pages
Google is the leader is dead and old pages according to a study cited by Google Operating System.Ziv Bar-Yossef, from Google, wrote a paper about sampling random pages from a search engine's index using queries. He explains some of the technical details in this video, including the utility of sampling random pages: comparing search engines, estimating the amount of spam, of fresh results etc.Ziv Bar-Yossef found that about 2% of Google's pages were dead pages compared to less than 1% for MSN and Yahoo. Sometimes dead pages can contain useful information -- such as the cached pages of a site that has been removed -- so it isn't always a bad thing to store dead pages. (via Search Engine Watch)
He applied the results from his paper and compared Google, Yahoo and MSN Search. Here are three charts that show a comparison of the index size, how many dead pages are in each search engine and how fresh the results are. The charts are only an estimation, and they have a bias of around 10%. As you can see, Google doesn't do very well.
Posted on September 25, 2006
blog comments powered by Disqus


