By Stephen H. Wildstrom
TECH & YOU PODCAST
Popular wisdom holds that you can find anything on the Web. And if you're looking for information on products, transportation schedules, or tourist attractions, it's probably true. But there is a vast body of knowledge hidden either in the so-called deep Web that browsers can't find or in those archaic but wonderful repositories called books.
Two factors combine to make so much valuable and authoritative information inaccessible. The bulk of human knowledge represented by printed material -- especially the portion that is more than 25 years old -- does not exist in digital form. In addition, most books and other printed matter published in the last century are still under copyright, and rights owners want to know they'll be compensated for the use of their material.
PEEK AT PAID CONTENT. Yahoo! (YHOO) and Google (GOOG) are leading the way in efforts to open this world of print and proprietary material to browsing. Yahoo's latest move, Yahoo Search Subscriptions (search.yahoo.com/subscriptions), provides easy access from a search screen to an assortment of publications and other materials available only to subscribers.
For example, a Yahoo search of the Web for "Intel chipsets" returned over 2 million hits. A subscriptions-only search returned just 33, mostly from the archives of the Institute of Electrical & Electronic Engineers (IEEE) and Forrester Research.
Anyone can use Search Subscriptions. But in general you'll get to see only an abstract of the documents you find, unless you have a subscription to the publication or database. So as it stands, the new tool, which Yahoo describes as a trial, mainly provides a simple way to do a search that's restricted to paid services.
"AUTHORITATIVE" SOURCES. I think Yahoo Search Subscriptions needs to offer a much broader menu of services. Currently there are only seven: the online versions of Consumer Reports, the Financial Times, The New England Journal of Medicine, and The Wall Street Journal -- plus TheStreet.com and the Forrester and IEEE publications.
In addition, Yahoo should let searchers buy individual articles without subscribing. This requires solving the vexing problem of handling small online purchases efficiently. You can't sell something for 50 cents, say, if it costs 25 cents to process the transaction.
Google has taken a somewhat different approach to searching "authoritative" sources. Google Scholar (scholar.google.com) limits search to sources such as refereed journals and professional-society archives.
But again, much of the material turned up by the search will be accessible only to subscribers. Scholar -- along with the similar Scirus.com, operated by the technical publisher Reed Elsevier -- are most useful to those affiliated with universities or other institutions that provide blanket access to these materials.
PUBLISHERS UP IN ARMS. Probably the most intriguing project is Google Print (print.google.com), an attempt to scan the contents of the world's books. One part, developed with publishers, lets people search the contents of current books -- an effort similar to Amazon.com's Search Inside. The more ambitious piece, an outgrowth of the National Science Foundation's digital-libraries initiative, aims to put leading research collections online.
This project has a long way to go, not least because publishers are already up in arms over copyright (see BW Online, 6/22/05, "A New Page in Google's Books Fight"). So far, relatively few books have been digitized. Among those are many copyrighted works that are in libraries but out of print. Google lets you search the contents of these works but only serves up snippets of text surrounding the search terms.
Even if I end up having to go to a university library to see the whole book, this still strikes me as a powerful tool that I would have died for back in my student days. As useful as the Web is, Google Print shows how much is missing. It's good to see it gradually coming within clicking distance. Wildstrom is Technology & You columnist for BusinessWeek. You can contact him at firstname.lastname@example.org