The Web's Weaver Looks Forward

Tim Berners-Lee, chief architect of the World Wide Web, explains his vision of the next stitch in the process: The Semantic Web

Tim Berners-Lee is lucky: He has been able to watch his creation transform society and business. However, although the World Wide Web will be only 12 years old this December -- six months after its chief architect turns 47 -- Berners-Lee envisions a much richer Web. He calls it the Semantic Web.

This next-generation Web will be imbued with language-like abilities that enable computers to understand meanings and relationships so digital systems can better serve businesses (see BW, 3/4/02, "The Next Web"). Berners-Lee talked with BusinessWeek Senior Writer Otis Port in his digs at Massachusetts Institute of Technology, home of the World Wide Web Consortium (W3C), which he has headed since 1994. Following are edited excerpts of their conversation:

Q: How did the concept of the Semantic Web evolve from the Web?


Well, it didn't -- it goes all the way back to my original idea for the Web. Look at my original 1989 proposal for a system for managing information about all the physics going on at CERN [European Organization for Nuclear Research], and you'll see that the arrows linking various elements have labels on them, explaining what they signify.

We also talked about semantics at the First World Wide Web Conference, in 1994. Then, we had just hypertext links between documents -- documents that are kind of flat, boring, because only people can get the meaning out of them. These documents are about real things: home pages for people, title deeds for houses, and so on. Corresponding to the links between the documents, there's also a real relationship between real things, like the ownership of a house by a person. The Semantic Web allows the meaning of the links to be expressed so a machine can process it.

With the Semantic Web, we're moving from where XML [Extended Markup Language] is -- just talking about the structure of documents and data -- to the Resource Description Framework [RDF], a convention for using XML to express the relationships between real things.

Q: And that gets us what?


When you know what the meaning is, you can do new things. You can say in a query to the computer, "Show me the chain of command between me and the head of the organization." Or you can ask, "What are all the things that such-and-such depends on?" A person can find this information, but only by painstakingly searching the Web. With the Semantic Web, the computer will quickly come back with a definitive list.

Some very solid core technology is emerging. The DAML [DARPA Agent Markup Language] folks have a repository of lots of different ontology elements. There is a DAML services effort, which is built on top of RDF and uses ontologies for exchanges of information with well-defined meaning. There were some comments at the last Web Services Workshop to the effect that, actually, Web services should really be described using Semantic Web technology.

So we've got this interesting situation where academic researchers are out there proving the feasibility of things, and industry is mainly sitting back and waiting to see if this is all going to make sense. And as various pieces do clearly make sense, they're being brought into the consortium so we can end up with interoperable standards.

Q: Where do things stand now?


The lower layers of the Semantic Web, like XML, are already pretty much standardized. The middle layers, like RDF and Web ontology, are in the process of being standardized. And the upper layers are still in research, the upper layers being things like a universal logic language that can basically represent any logical statement.

There are a lot of existing rule-based languages now, usually with limited power. For most problems, you want that. You don't want a language so powerful that you can ask questions that can't be answered quickly. You don't want to give somebody the ability to ask whether Fermat's Last Theorem is true, because that question has baffled mathematicians for three centuries. So the computer would go away and think about it -- probably forever.

That's why there's a limit to what you can ask with SQL [structured query language], for example. The machine does those operations and squirts you back the results. The advantage of limiting the power is that you get answers quickly enough so you can run multiple iterations and optimize things.

The Semantic Web will have a number of query languages with different levels of power. But to unify them all, you need a logical language that is very expressive -- where, in general, you won't be able to expect to get an answer in finite time.

Q: But you just said...


Despite what I said earlier, there is a use for a powerful universal logic language. Many systems will answer queries by using various forms of rules having different capabilities. One may be able to find an answer that another could not. In these cases, the logic language will be a way for one machine to explain to the other how it got there.

Couple that with digital signatures, and you could construct a complete Web of trust. You can represent all the actual traffic in society and business, and build secure systems. Because the Semantic Web languages are quite expressive, they won't force you or your company to work in a particular way just because that's the way the software was written.

Q: Talk about how people will benefit in their daily lives.


Suppose you're on the Web and find a conference you want to go to. The Web page has the date and the time, it's got the people involved, and the price. It may even have the latitude and longitude -- via a link to one of the map sites.

So what do you do? You cut-and-paste the title of the conference into your agenda book, hoping you don't accidentally get the wrong week. You fill out a registration form and e-mail it back to the organizers. You write down the latitude and longitude on a piece of paper, take it to your car's navigation system, and key it in. At the level of data, there is no interoperability.

You would prefer, when you see the notice of an interesting conference, to just say: "O.K., I want to go," and everything will get taken care of, automatically. All the entries will pop into your agenda, your address book, your GPS. Then you'll get pinged by your agenda when it's time to leave, because it knows from your GPS how long it will take to drive there. And, in addition, it will block out the driving time on your schedule and alert you if you try to make a conflicting appointment.

Q: Is this starting to happen?


Sure. For example, there was a great meeting at the W3C Technical Plenary in February, 2001. Something like 24 people in the RDF Interest Group came together for the first time physically, and there was tremendous excitement. "Oh? You're exporting calendar information? I'm importing calendar information. Let's get together." As it turned out, five distinct groups got together.

Also, a few companies are pushing these ideas. But we're still at the stage where most of the enthusiasm is from individuals who have realized what's going on and find their way to W3C. Then some of them begin dragging other parts of their company into this new phase.

Take Adobe Systems. It's building interesting products with Semantic Web hooks. That's because, basically, one person "got it." And because he got it, Adobe's software metadata is being reorganized around RDF. They're using Web ontology-level power for managing documents. Now, the information in PDF files can be understood by other software even if the software doesn't know what a PDF document is or how to display it.

Nokia was also involved very early in RDF. You can't store data for just one output device, because they come with a huge variety of screens -- from small, low-resolution cell phones to the new, fancy high-resolution PDAs. How do you store information so it can be repurposed for all these channels? You store it in semantic form. Then, when data arrives at a device, the data arranges itself to the display it finds -- or the data can be converted to voice output for a cell-phone user who is driving.

RDF will become more and more ubiquitous as companies grasp its potential. Financial software providers should look at RDF, because then the data can be reused in many valuable ways. You want to connect the data to budgeting, to inventory and stock control, to manufacturing, to financial reporting? With RDF, you write a few rules, and sharing the data is a breeze. It works as a hub, connecting your applications as spokes.

Q: In your book, Weaving the Web, you said cultural diversity is important because the interplay of many different viewpoints is more likely to lead to optimum solutions to global problems, such as climate warming. But with the Semantic Web, isn't there a danger that all the sharing will tend to erode many differences -- and actually, might that not make for a better world?


Yes, you need some homogeny, at least enough to stop people from shooting each other. But you also need diversity to provide the intellectual richness that is important to fostering new ideas and resolving problems.

Society is stable when organized fractally. A fractal society is where you get the optimal blend of homogeny and diversity: the proper balance between global, national, local, and family issues. Human beings have evolved with an awareness of all these scales, so we behave in a way that tends to create a fractal society -- if individuals in fact spend time worrying about issues at all these levels.

Conflicts arise when we become so preoccupied with one or two levels that we don't think about the others. The Web should help that situation [by encouraging people] to function properly as a fractal society.

Before it's here, it's on the Bloomberg Terminal.