The Ultimate Indexing Job

THE INTERNET WILL PUT MORE DATA ON tap than any other technology in history. That's the promise--and the problem. As anyone who has used a search tool such as Yahoo! or Alta Vista knows, weeding out all the irrelevant "hits" can try one's patience. And it's getting worse. If businesspeople can't find critical information quickly, the Net's potential could be crippled.

To avoid that, researchers at the National Center for Supercomputing Applications (NCSA) in Champaign, Ill., are developing an indexing scheme similar to the one used by librarians. It's part of the federal government's Digital Library Initiative--and it's turning into a far bigger job than expected.

The NCSA team, headed by researcher Bruce Schatz, tested the approach with 10 million abstracts from the engineering library at the University of Illinois at Urbana-Champaign. They decided to have computers segregate them into 1,000 subject areas after analyzing their content. But creating indexes for even this comparatively small sample quickly overwhelmed the team's workstation computers. So the job was transferred to one of NCSA's supercomputers, which cranked away for four days. Schatz says indexing may well turn out to be the toughest problem NCSA has ever undertaken.