See part 1.

As I said the concept of the “Decentralized Web” wasn’t made very clear at the Summit. Everyone seemed to know what everyone else was talking about.

Mitchell Baker (one of the few women and a powerhouse in her own right) explained it like this:

  • Agency. User agent can choose how to interpret content provided by a service offering.
  • Immediate. Safe, instant access to content accessible via a universal address (similar to http) without an install.
  • Open. Anyone can publish content without permission or barriers and provide access as they see fit.
  • Universal. Content runs on any device or any platform.

The above description is what I tried to keep in my head throughout the Summit. The questions it brought up were:

  1. Is this a new concept or am I just learning about the movement?
  2. Are these concepts coming around again after 20 years as they seem familiar principles from when I first got on the web?

Vint Cerf @vintcerf

Vint Cerf came next. He is a VP & Chief Internet Strategist at Google. He didn’t talk much about Google and didn’t do the gratuitous mentions (much) like many do. Instead he talked about the Digital Dark Ages, which is a time we are in now where content we think is permanent blinks on and off. Files simply disappear and links die.

He is advocating for a Self Archiving Web.

What we have learned from the Internet is that the following is important:

  • Collaboration and Cooperation
  • Open design and an evolution process
  • Anyone can join by following the protocols
  • Multiple business models are acceptable
  • Modular, layered evolution
    • who really cares about how packets are being carried as long as they are delivered?
  • E plurbis unum

Thinking Out Loud

Some of the things that Cerf mentioned were compression schemes and what they could and couldn’t do, Tarball and related formats, the importance of storing and being able to recover particular objects as well as the storage of software with versioning. Imagine a world where you have a lot of digital stuff, but you can’t access it because you don’t have the software.

Internet Archive

The Internet Archive is starting to ‘archive’ the web. Even they admit their efforts are inadequate. They take snapshots of sites. They are time indexed copies. Hyperlinks don’t work. Hyperlinks will have to be reformed to resolve with other archived (snapshot) sites “InArchive” pages.

A complete archiving effort for the web has to be:

  • self-contained
  • include continuous crawling
  • determine how to decide when to create a new instance
    • Is there a role for RSS??

Then he had a random question: Does a set of all sets include itself?

=>This question alone made the whole conference worth it. This type of question would never come up in the corporate world. If you think about it, it will start to blow your mind.

Environmental Thoughts

The web can hardly contain itself. Is there room to replicate it? (LOCKSS)

Hyperlinks deteriorate with time

  • Perhaps we need a permanent link – something like digital object identifiers

HTMLx rendering challenge: backward compatibility

What about permissions, access control, copyrights

  • these cause problems for archiving over decades or centuries

Some Basic Thoughts

  • Automatic, cooperative replication of created web pages could be part of the publication process.
  • Is there a role for something like Good docs property of replication/real-time synchronization?
  • Could there be a reference space (like a reference room in a  library) held in common by cooperating web archives?

More Thinking Out Loud

  • Is there a role for publication/subscription mechanisms for cooperative and other entities?
  • A lot of metadata is needed to replicate some kind of Time Machine feature
  • A library of rendering/interpretation software is needed to correctly render archival material on the web
    • there is a whole subtext of permission to use here
    • how do you use software over time (decades, centuries) when companies go out of business, go bankrupt (and the judge won’t give you access because it’s an asset)
  • Guaranteeing backward compatibility is a good goal, but not practical over hundreds of years.

Surfing the Self Archiving Web

  • Multiple, alternative resolution targets (shouldn’t matter which is chosen if the system works right)
  • Note: static media of newspaper and magazines and books are snapshots of a ‘work’ in time. We have editions. People can see the work as it existed at that time and something similar might be valuable for the web and digital objects.

Desirable Properties

  • Auto-archiving upon publication?
  • could auto-archiving be a service for which you sign up? How would it be funded?
  • Registration of rendering engines (and permissioning system?)
  • Auto-malware filters
  • Fidelity levels
    • everything works
    • surface display only (no links)
    • other? e.g. can see there is a video, but don’t have software or rights to render it so user can only see part of the object
  • Once archived, is a page an indelible and unalterable instance? Could it be useful in a court of law?
    • could different levels of fidelity have different uses, e.g. an official version?
  • Is there an ‘official records’ side effect of making Self Archiving web work?
  • Can such a system work for encrypted content?
  • Can access to parts of the archive be access controlled
    • e.g. release after 25 years?
    • what about a separate archive of encryption keys?
    • could metadata be added to trigger access?
  • Is there a role for containers?
    • all information needed to do rendering would be held in one container
      • Google is doing this for the Android Framework
  • Apps?
    • What is their role in the structure above?
    • How would they be shown/displayed in 25 years?
    • Would an index of information for apps help render them?
    • archive of apps
    • digital dioramas
  • Digital vellum could be an ad model + a subscription model + a service model

We discussed the right to be forgotten in a digital archiving space

  • In order to wipe something out you have to remember it because you have to know it in order to get rid of it (another mind blower!)

 

 

 

Listen/watch the video of the whole conference for more detail and more information.

Resources:

Cory Doctorow’s presentation
Tim Berners-Lee’s Solid project
Twitter hashtag #DWebSummit