Sunday, March 6, 2011

How would someone download a website from Google Cache?

A friend accidentally deleted his forum database. Which wouldn't normally be a huge issue, except for the fact that he neglected to perform backups. 2 years of content is just plain gone. Obviously, he's learned his lesson.

The good news, however, is that Google keeps backups, even if individual site owners are idiots. The bad news is, that traditional crawling robots would choke on the Google Cache version of the website.

Is there anything existing that would help trawl the Google Cache, or how would I go about rolling my own?

From stackoverflow
  • This article might be some help to your friend. http://www.smartmoneydaily.com/business/how-the-google-cache-can-save-your-a.aspx

    d8uv : For those too lazy to read the article, it essentially says to spider http://www.google.com/search?q=inurl:stupidwebsite.org/forum/&filter=0 Which is what I'm gonna do, unless someone else has bright ideas
  • You may want to consider looking at crawling the archive.org cache as well. If you're in there, it's generally better structured.

0 comments:

Post a Comment