Retrieving an entire website using Google Cache?
There is a site that I want to retrieve from Google Cache that had thousands of pages. Is there any way I can get it back quickly using Google Cache or some other web crawler/archiver?
You can see what Google (still) knows about a website by using a
You might also check out the Internet Archive.
(In either case, you’d probably want to do some heavy-duty automating to fetch thousands of pages.)
I created a free service to recover your website which can retrieve most pages from the search engines cache.
The output of the service is a zipped file with your HTML from the search engines cache. It is still in beta so it still needs a lot of tweaks and bugfixes, but hopefully it can help you or other people who experience the same problem.
If you are the actual owner of the website I would strongly recommend that you do not do the "site:[domain]" Google search as it does not show all the URLs but rather an approximate. It also is very difficult to see all the links as it is not in some sort of spreadsheet.
I would instead suggest that you use Google Analytics using the following steps:
- Log into Google Analytics and choose the date section that you would like to view (try make it as big as possible to capture all the URLs... even possible old URLs that no longer work if your site has has an update and a change of URL structure)
- Go to "Behavior"->"Site Content"->"All Pages"
- Choose the number of links you want to see in the file (I normally go for 1000, but you can choose more or less)
- Lastly download all those URLs into file by clicking on "Export"