The National Library of Australia has just launched its Australian Web Archive – a massive, freely accessible collection of content that provides a historical record of the development of world wide web content in Australia over more than two decades.
The new archive is a momentous achievement. Containing annual captures of all accessible pages on .au domains and dating back to 1996, it dwarfs even the the Library’s own PANDORA Web archive – a curated collection of Australian web content deemed to be of national significance by the librarians.
That the Library has managed to construct this national web archive, and to conduct annual captures of the .au domain since 2005, is all the more remarkable given the chronic underfunding of its activities by successive federal governments. But with the increased use of social media and apps, the next step is to archive platforms such as Facebook and Twitter – unfortunately, this will be an even bigger challenge.
Why archive the web?
More and more material of cultural, social, political, and ultimately historical significance originates online. This has created significant problems for national libraries and archives used to dealing with books, newspapers, letters, and other hard copy materials. Because of constant changes in formats and systems, digital materials are perhaps even more difficult to preserve and keep accessible than
print or photos.
Imagine the digital heritage we would lose if such content was not being archived, though. The web is a communally authored space that contains everything from official government announcements through mainstream news reporting to personal homepages. It provides a record of cultural activity and public debate that is far more comprehensive than the written materials that survive from earlier periods in human history.
Already, much of the early web has been lost, however. This is also because of the constant turnover in popular platforms. Early community sites such as GeoCities and MySpace have all but disappeared, and so has the content hosted on them – even in spite of user efforts to preserve at least some of the material from deletion.
The Australian Web Archive is a critical initiative to save what can still be saved, with a strong focus on Australian content. For future historians, it is a treasure trove: how did the general public respond to events of national significance, from John Howard’s election loss through Cyclone Yasi to cricket’s ball-tampering scandal? How did our views, attitudes, and beliefs change in more subtle and gradual ways over time, in relation to everything from anthropogenic climate change to My Kitchen Rules?
The real problems are only just beginning
In spite of its achievements, the Library can’t rest on its laurels. Because of the malleability of digital data, online content is also more changeable than print. Websites can change every day, every hour, every minute, and keeping track of such fast-paced content is only getting more difficult.
A crucial problem here is the increasing “platformisation” of the Internet. We no longer simply use “the web”: we’re on Facebook, Instagram, Twitter, WhatsApp, and any number of other platforms, and we’re more likely to access them through mobile apps than desktop browsers.
Such platforms and their content are themselves of immense historical significance: hyperactive Twitter is a first draft of the present where we are constantly commenting on breaking news stories; all-conquering Facebook and its profiles, pages, and groups are a near-comprehensive representation of our collective interests and obsessions; audiovisual Instagram provides a constant stream of photos and videos of contemporary life.
A comprehensive archive of their contents would provide the material for a generation of historians and media, communication, and cultural studies researchers – but the technical, ethical, and political obstacles are immense, and growing.
Archiving the platformised internet
First, the platforms themselves usually don’t provide comprehensive access to their content. But even where they have, archiving institutions have struggled to cope with the overwhelming volume and unfamiliar forms of this fire hose of information – as the US Library of Congress found to its detriment when it accepted the “gift” of full and continuing access to Twitter.
Second, users of these platforms have not usually provided explicit permission for their content to be archived and made available by national institutions, except in click-through agreements that none of them have actually read. National libraries have substantial experience in handling sensitive content, and are perhaps better placed than most other institutions to develop ethically sound access mechanisms, but few have chosen to tackle this thorny issue so far.
Finally, the growing backlash against social media platforms’ commodification of user data in the wake of the Cambridge Analytica scandal has led many platform providers to take an increasingly defensive stance, and reduce data access even for legitimate purposes, including research.
Facebook’s just-announced “pivot” towards an emphasis on private over public communication must be seen in this context. It is less a genuine attempt to protect users from unwanted intrusions than an intervention designed to reduce pressure from regulators and policy-makers. Facebook’s core business model remains commercialising its users’ data, after all.
In combination, such developments present critical new challenges for the National Library of Australia and other institutions charged with documenting our continuing national narrative.
The Australian Web Archive is a remarkable achievement, and a powerful indicator of the world-leading expertise the Library has managed to assemble in-house. What is needed now is the political will and budget support to enable the Library to maintain this initiative, and to extend it to the new spaces where that narrative now continues to be told.
Image: The National Library of Australia recently launched the Australian Web Archive – a historical record of Australian web content (Shutterstock)