Musings on the value of a snapshot

I love the Internet Archive. It has become a good solution to a vast and difficult problem. I use the Wayback Machine nearly every day, but I don’t think any of us who built the early web saw it coming. It is obvious in hindsight, but so much of those early days felt like play rather than work, who had time to archive any of that, or even thought it would be useful? Now, as I begin to look back over my career and the crazy temporality that we have all happily accepted for the last thirty years; I am so thankful that The Internet Archive exists at all. The archive is buggy and broken and probably shouldn’t exist. But in a world of vaporware and empty promises where marketing has always spoken louder than engineering, a little context goes a very long way. The web snapshots that the Wayback machine provides, as bad as they are, are truly invaluable.

Long before Alexa was assisting us by piping up for no reason, she was a web crawler. In the mid to late ‘90s searching, for anything, was a nearly impossible task. Searching the web as a whole just couldn’t be done. Long before Google became a verb, most of the search engines were nothing more than link pages curated by actual humans. Yahoo, for example, was a directory, like the Yellow Pages.  So much of the joy of discovery at that time was simply finding all the new things, for the first time. The crawlers and spiders didn’t care about opinions and flame wars, they were robots doing a job quietly in the background. They dispassionately walked the web following each and every link to its conclusion. They didn’t care what the contents of the page said, they were there for the links. Because of that, the web crawlers inadvertently became the documentarians of the early web.

I remember the first time I saw the Googlebot, Google’s PageRank crawler, sucking up all the pages that I was working on, in obscurity… I thought. Quickly caching pages (and more importantly, server-side renders) So quickly, in fact, I felt obliged to post ‘under construction’ pages. I’m sorry. Googlebot was following backlinks and saving everything that it came across. Such a beautifully simple idea that literally changed everything. But it certainly never occurred to me when I was buying 200-megabyte hard drives for $1000 to take a snapshot of any of that data. It was well into the naughties before I had anything like complete daily or weekly backups. The only snapshots I was taking at the time involved a disposable camera and most of those snapshots have been thrown away.

Snapshots are always frustrating and incomplete but they are better than nothing because our memories are terrible. In fact, our memories are generally so bad that we can’t even document what we have lost.  What snapshots have we thrown away? With the passage of time, the incompleteness of a snapshot fades and the record becomes the center of its value. Each little detail, important only to itself, adds up to the greater meaning.  I suspect, in the coming years, some bright spark will connect the Internet Archive to a deep-learning model and create a fully functional emulation of the World Wide Web as it was on any given day in history. A complete simulation with all its bugs and problems rendered, just as you liked it, with the abstraction layers faithfully recreated. Historians and researchers will be able to lose themselves, as we did, browsing endless websites for the first time, again. The AI might even be able to simulate the epic flamewars of forums past, as non-player characters endlessly recreate famous shouting matches for all to watch, in seeming realtime, all thanks to the Internet Archive. The value of data is at the point of use and we will never be able to figure out what all we have lost over the years. But I wanted to take a moment and thank Brewster Kahle for not throwing out his snapshots way back when.

Donate to the Internet Archive

Wayback to way forward Internet Archive at 25
author avatar
R. Josh Quarles