How To Access Pages Missing From The Internet
Using the Wayback Machine to Find Lost Pages
404 pages have gotten more creative over the years:
404 pages from GitHub (left) and HopperMagic (right) (Source)
However, that does not make them less annoying, especially when searching for critical data. Pages can disappear for many reasons: someone forgot to pay hosting fees, governments deemed the info subversive, individuals try to scrub records from the web, or mundane infrastructure problems. The average life of a webpage has been variously reported as 44, 75, and 100 days; whatever the exact number, one thing is clear: the Internet is leaky and content is not guaranteed to stay around forever.
You can also go to http://web.archive.org/ and search for the url:
Saved versions of a previously inaccessible website (here)
This shows you all the versions of the webpage saved over time. If you want to see previous versions of a website, paste in the url and time travel.
Saved versions of towardsdatascience.com
The Wayback Machine is a digital archive of the World Wide Web and other information on the Internet developed and maintained by the Internet Archive, a non-profit with the modest goal of “archiving the entire Internet and providing universal access to all knowledge”. They maintain a library of digital content free for anyone to access. To date it contains:
- 330 billion web pages
- 20 million books and texts
- 4.5 million audio recordings (including 180,000 live concerts)
- 4 million videos (including 1.6 million Television News programs)
- 3 million images
- 200,000 software programs
All of this information — growing by 15 TB per day (as of 2016) — lives on physical infrastructure at the Internet Archive. Currently, there are over 20,000 disks of up to 8 TB trying to archive the entirety of human knowledge.
This may seem a little overwhelming, but on a practical level, you can use the Internet Archive’s Wayback Machine to access missing web pages that have been saved. If you’re worried about a page being taken down (maybe because it’s controversial) you can also save it through the Chrome Extension or the Internet Archive website. There are other tools that may allow you to see old web pages like Chrome’s cached pages. As with all human infrastructure, the Internet and digital tools crumble over time, nonetheless, the Internet Archive and Wayback Machine give you one way to fight the decay.
As always I welcome feedback and constructive criticism. You can find me on Twitter @koehrsen_will.