Tuesday, December 11, 2007

Robert Kahn talk at U of T

Robert Kahn is talking today at U of T on Managing Digital Objects on the Net, as part of the Distinguished Lecture series. He is the co-inventor of the TCP/IP protocol and worked on open-access networks. His bio is below:

Dr. Kahn has had a distinguished career; in 1972 he demonstrated ARPANET,
for which he was a principal architect. After becoming Director of DARPA's
Information Preocessing Techniques Office, he started the United State's
billion dollar Strategic Computing Program, the largest computer research
and development program ever undertaken by the federal government.
In 2004 he shared ACM's Turing Award along with Vict Cerf for their design
of the TCP/IP protocol which is at the basis of the Internet. Among his many
other honours are the National Medal of Technology, presented by President
Clinton, and the Presidential Medal of Freedom, from President Bush.
Dr. Kahn is now President of the Corporation for National Research Initiatives,
a not-for-profit organizaton which performs research on strategic development
of networked-based information techniques.

Bob is talking about how he is now turning his attention to managing information and digital objects. The particular topic he is working on is archiving and about Digital Object Architecture. Managing information on the net is all about trust, will people trust the information? We are moving beyond a world of static information but dynamic. He helped to design the internet architecture to be evolving and open. Any object or person can be represented in digital form and have a presence on the net, this is similar to the vision of HP Lab's CoolTown project. Before, the internet was packet communication and moving the bits. Now that is done, the next step is information management.

His motivation for the Digital Object Architecture is not just be open, but also to be able to access it over very long periods of time, similar to accessing old books and articles from the library. The digital object is structured data and interpretable in machine-independent fashion, just like a packet on the network or a file on your computer. What are the technical components? Digital objects need to have a unique identifier, a resolution system, repositories and registries. Does that sound familiar to something else? It's just like web services or object-oriented programming objects, the same type of architecture, nothing different. There needs to be a data structure for a digital object, just like a literary work starts as an idea, there needs to be a format in a fixed form. Network resources can be identified and then managed, just like we have MAC addresses to identify hardware. This is all deja vu based on object-oriented architectures, so it's nothing really new as it's all been done before. But the protocol for managing these digital objects is not universal, since everyone is doing their own thing for managing digital objects. Just like managing contacts, people use VCS formats, or a Palm format, or an Apple contact format, or XML format, there is no unified structure.

The repository is where digital objects are indexed and can be accessed directly through the Digital Object Protocol. The idea is to get rid of the underlying infrastructure of finding objects and access them directly. He gave the example of trying to find an e-mail from a particular person at a particular point in time. You have to go through files and folders and possibility different operating systems and laptops if you had more than one. There is a Digital Object API to get access to the digital objects through different client interfaces such as FTP, HTTP, IMAP and SMTP. You also want to have a digital object client that interfaces directly with the digital object. The different client interfaces are used so that digital objects can be accessed through traditional means on the net.

The handle is an identifier or pointer to the digital object and the Digital Object Architecture has been implemented in the Handle system. The Handle system actually works today and many library and cataloguing system are using the Handle system for managing their information like the DOI system. The Handle system software is written in Java. If a client wants to access the handle, it uses a proxy server (hdl.handle.net) to get access to the Handle system, so it uses the existing internet architecture without any change to the DNS.

The Digital Object Architecture can also be used to manage items of value such as digital cash. MetaObjects in the architecture are like generalized folders and metadata registries are like web service registries (like UDDI). CORDRA is a federated collection of metadata registries, just like in the Jini architecture as well. In the demo, there is a version for using the digital object architecture with Adobe Acrobat PDF files. Bob is mentioning how digital object identifiers are growing at a rate of 4-5 million per year and being used by customers today.

One of the intellectual questions that Bob is addressing is what information do companies want to share and make public? A very interesting and complex question to address. There are many different applications of digital objects like network storage and archiving, identity management, PKI infrastructure, authentication of information, personal locator information, digital cash, publications, cataloguing, and even social networks (in my opinion).

So one of things that the talk is based on is that it uses the existing internet architecture. My question to Bob was whether the Digital Object Architecture will work with new internet architectures such as Internet 2.0 being proposed by MIT and other research institutions, and the planetary internet by NASA and Vint Cerf. His answer was that the Digital Object Architecture is independent of the network, so it is agnostic to whether the internet radically changes. However, I'm a little bit skeptical of that because what happens if all the concepts that we think about the internet totally change? What happens if there is no DNS resolution system? Would the architecture still work?

Very interesting talk and he was great in answering questions, where he repeated the question to the audience before answering. Something that I should follow in my talks as well.

On Technorati: ,

No comments: