Taming the World Wide Web

A rising tide of companies are tapping Semantic Web technologies to unearth hard-to-find connections between disparate pieces of online data

When Eli Lilly scientists try to develop a new drug, they face a Herculean task. They must sift through vast quantities of information such as data from lab experiments, results from past clinical trials, and gene research, much of it stored in disparate, unconnected databases and software programs. Then they've got to find relationships among those pieces of data. The enormity of the challenge helps explain why it takes an average of 15 years and $1.2 billion to get a new drug to market.

Eli Lilly (LLY) has vowed to bring down those costs. "We have set the goal of reducing our average cost of R&D per new drug by fully one-third, about $400 million, over the next five years," Lilly Chairman and Chief Executive Officer Sidney Taurel told the American Chamber of Commerce in Japan last August.

As part of its cost-cutting campaign, the drugmaker is experimenting with new technologies designed to make it easier for scientists to unearth and correlate scattered, unrelated morsels of online data. Outfitted with this set of tools, researchers can make smarter decisions earlier in the research phase—where scientists screen thousands of chemical compounds to see which ones best treat symptoms of a given disease. If all goes according to plan, the company will get new pharmaceuticals to patients sooner, and at less cost.

Found in Space

Those tools are the stuff of the Semantic Web, a method of tagging online information so it can be better understood in relation to other data—even if it's tucked away in some faraway corporate database or software program. Today's prominent search tools are adept at quickly identifying and serving up reams of online information, though not at showing how it all fits together. "When you get down to it, you have to know whatever keyword the person used, or you're never going to find it," says Dave McComb, president of consulting firm Semantic Arts.

Researchers in a growing number of industries are sampling Semantic Web knowhow. Citigroup (C) is evaluating the tools to help traders, bankers, and analysts better mine the wealth of financial data available on the Web. Kodak (EK) is investigating whether the technologies can help consumers more easily sort digital photo collections. NASA is testing ways to correlate scientific data and maps so scientists can more efficiently carry out planetary exploration simulation activities.

The Semantic Web is in many ways in its infancy, but its potential to transform how businesses and individuals correlate information is huge, analysts say. The market for the broader family of products and services that encompasses the Semantic Web could surge to more than $50 billion in 2010 from $2.2 billion in 2006, according to a 2006 report by Mills Davis at consulting firm Project10X.

Data Worth a Thousand Pictures

While other analysts say it will take longer for the market to reach $50 billion, most agree that the impact of the Semantic Web will be wide-ranging. The Project10X study found that semantic tools are being developed by more than 190 companies, including Adobe (ADBE), AT&T (T), Google (GOOG), Hewlett-Packard (HPQ), Oracle (ORCL), and Sony (SNE).

Among the enthusiasts is Patrick Cosgrove, director of Kodak's Photographic Sciences & Technology Center, who is, not surprisingly, also a photo aficionado. He boasts more than 50,000 digital snapshots in his personal collection. Each year he creates a calendar for his family that requires him to wade through the year's photos, looking for the right image for each month. It's a laborious task, but he and his colleagues aim to make it easier.

One project involves taking data captured when a digital photo is taken, such as date, time, and even GPS coordinates, and using it to help consumers find specific images—say a photo of mom at last year's Memorial Day picnic at the beach. Right now, much of that detail, such as GPS coordinates, is expressed as raw data. But Semantic Web technologies could help Kodak translate that information into something more useful, such as what specific GPS coordinates mean—whether it's Yellowstone National Park or Grandma's house up the street.

Ready for the Crowd

Semantic Web technology has implications that are also cosmic. NASA's Jet Propulsion Laboratory, for instance, has experimented with semantic tools from Siderean Software designed to help engineers and scientists better navigate aerospace research and scientific data. Engineers and scientists say the software helps in trying to find relevant information, especially when it is distributed across multiple repositories or data resources.

On Apr. 2, ZoomInfo launched a new business search engine it says is the first market-ready semantic search engine. The engine automatically crawls publicly available business information—from corporate Web sites to press releases and electronic news services to SEC filings—adding semantic tags and organizing information so that it can be easily found later.

This type of information can be especially helpful to recruiters who may be looking for midlevel managers at specific companies. "It gives me more people in a targeted company to pursue," says Alan Bogard, manager of recruitment for the Full Spectrum Lending Div. at Countrywide Financial (CFC), which uses ZoomInfo.

The Way of the Web

As useful as the tool may be, the information it yields isn't always accurate, Bogard says. But the occasional inaccuracy only highlights one of the strengths of the Semantic Web, says ZoomInfo CEO Jonathan Stern.

The software automatically collects and correlates information, and unlike other business directories, doesn't require a human editor. As a result, ZoomInfo can process more data faster. "Our crawlers make mistakes, but the power of having thousands of crawlers overwhelms mistakes by giving fresher data," Stern says.

Another, more fundamental, challenge is describing exactly what the Semantic Web is. No one knows that better than Tim Berners-Lee, inventor of the World Wide Web and director of the World Wide Web Consortium.

"It was really hard explaining the Web before people just got used to it, because they didn't even have words like click and jump and page," Berners-Lee says. "People just couldn't imagine it, and it's the same here."

Early in the development of the Web, Berners-Lee realized the potential for specifying relationships between strands of Web data. But it wasn't until 2004 that the World Wide Web Consortium released standards for the effort Berners-Lee had dubbed the Semantic Web.

What's in a Name?

Today, Berners-Lee concedes that the Semantic Web may not have been the best name because it is used by different groups to mean different things. Then again, there was also backlash when he coined the phrase World Wide Web, a name that has stuck (see BusinessWeek.com, 10/22/04, "The Web's Father Expects a Grandchild").

Whatever its name, many in the Web community say the Semantic Web shouldn't be called Web 3.0, as though it is somehow a progression from the sites and tools known collectively as Web 2.0. Work on the Semantic Web was well under way years before Tim O'Reilly, founder and CEO of O'Reilly Media, coined the Web 2.0 moniker. Indeed, the Web is evolving gradually, and it's a mistake to assign it release numbers like those attached to software, critics say.

Describing the Semantic Web as Web 3.0 is especially irksome to those who consider the Web 2.0 designation more marketing ploy than useful description of a generation of Web sites. Others feel that Web 3.0 might be a useful way of talking about the future Web, but there's no agreement on what Web 3.0 will mean. Will it refer to the third decade of the Web, post-2010, or will it signal a time when systems are so intelligent that the Web will be qualitatively different?

I Want My Data Back

No one is saying there's no connection between new Web and Semantic Web technologies. In many user-generated sites grouped under Web 2.0, users often tag their own data, be it photos, bookmarks, videos, or other content. "Web 2.0 is the messy way that the Semantic Web is actually happening," says O'Reilly.

Berners-Lee and others see a time in the future when the two efforts will interconnect. Eventually, Semantic Web technologies could help people unlock the value of much of that information they've contributed to the user-generated Web. "People are going to ask what happened to the data they put into these Web sites, and they will want it back," says Berners-Lee.

In the meantime, there are likely to be plenty of uses for Semantic Web technologies, especially for corporations struggling to get at information buried in disconnected data storehouses. "When you apply these concepts inside the enterprise, you realize you have a tremendous amount of information that you didn't know you had," says Eric Miller, president of consulting firm Zepheira.

Increased Database Flexibility

Yahoo! (YHOO) hopes to translate the technology into better products for customers. Over time, Yahoo's divisions have developed independently, resulting in information being stored in places that are disconnected from the rest of the company. "We want to be able to connect information in different properties," says Dave Beckett, engineer technical lead for Yahoo! Media Group.

So the group created a new system that uses some Semantic Web capabilities, making it easier to share and reuse content among various properties such as news, sports, or finance. The company is hoping the investment will eventually translate into a better overall experience, making it easier for visitors to discover related content.

Often, information systems simply aren't flexible enough to handle change within a company. Databases can become obsolete in 18 months because they are not flexible enough to accommodate new information. But the Semantic Web can specify new relationships while letting companies keep older databases.

Tracking Financial Trends

Before NASA held the contest that gave the Mars rover, Spirit, its name, the vehicle was known as Mars Exploration Rover A. It continued to be called that in various NASA systems even after the name change. But thanks to Semantic Web technologies, NASA could simply note the relationship between the old and new names, and scientists could find relevant information about Spirit whichever name they used, says Dave McComb, president of consulting firm Semantic Arts.

Companies can also use Semantic Web technologies to better respond to rapid change. Financial analysts and traders at Citigroup are inundated with information about fast-moving financial markets. So the banking powerhouse is looking at ways Semantic Web technologies can help extract valuable information, uncover trends, and generally help financial analysts, bankers, and traders make better decisions, says Rachel Yager, vice-president and program director of the semantic initiative at Citigroup.

In some cases, making heads or tails of a new situation can be a matter of life and death. The University of Texas Health Science Center at Houston worked with Oracle and consulting firm TopQuadrant to create a system that relies on Semantic Web technologies for large-scale public health surveillance. While the original goal of the system was to detect possible bio-terrorism, it was put to the test after Hurricane Katrina to track health-related information on evacuees who were moved into Houston's Astrodome.

Security Risks

Scientists used the system to identify outbreaks of illness before they could reach the citys population. "If there was a severe infectious disease in the shelter, it would spread via volunteers," says Parsa Mirhaji, director of The Center for Biosecurity & Public Health Informatics Research at The University of Texas Health Science Center at Houston.

In eight hours, Mirhaji's team was able to bring up a completely new system, collect information from new sources, and classify patient information. The team quickly detected outbreaks of gastrointestinal and respiratory illnesses. "The greatest benefit was being able to understand the fast pace of events and where things were heading before it became a real issue," Mirhaji says.

Of course, making it easier to comb through online data carries security implications. Semantic Web tools could make it easier for prying eyes to get at personal information.

"There can be a great loss of privacy if you don't do it right," says Dan Gruhl, a researcher at IBM Almaden Research Center, which has a whole group that focuses on privacy. A British company called Garlik is using Semantic Web technologies to help individuals monitor their personal information online and guard against identity theft.

While companies will surely encounter questions about security down the line, the main issue now for many companies is making sure Semantic Web tools work as intended. Says Patrick Hartman, a team leader in Eli Lilly's information technology department: "There aren't a lot of people we can turn to with experience. We're figuring this out on our own."

Before it's here, it's on the Bloomberg Terminal.