The race is on. Superfast computers are essential to high-level scientific research. Can the U.S. recapture the lead from Japan?

Just over two years ago, Tokyo forced Washington to eat crow. A supercomputer funded by the Japanese government trounced America's mightiest computer. In fact, the Japanese machine, called Earth Simulator, packed more number-crunching speed than the top 20 U.S. supercomputers combined.

For years, some U.S. supercomputing gurus had been warning that Washington's support of high-performance computing was too narrowly focused on the needs of the Pentagon's nuclear-weapons programs. Even acknowledging the U.S. strength in software, they warned that scientific research was being hobbled because U.S. supers were not designed to solve the really tough issues facing civilian scientists and engineers. Earth Simulator, built by Japan's NEC Corp. (NIPNY ), was proof positive of just how far behind the U.S. had fallen in scientific supercomputing.

"I was stunned," admits Raymond L. Orbach, director of the Energy Dept.'s Office of Science, a major sponsor of academic research. What surprised Orbach and other scientists wasn't the raw speed of Earth Simulator. The Japanese had been describing their goals since 1997. The amazing thing was how superefficient it was, especially when running simulations of climate trends and fusion-energy generators

On May 12, Energy Secretary Spencer Abraham laid out a comeback plan. Tennessee's Oak Ridge National Laboratory will establish a new supercomputing center for open science and engineering. It will consist of two or three monster computers from Cray Inc. (CRAY ) that will dwarf even the Earth Simulator. The Japanese system theoretically can whiz through an incredible 41 trillion calculations every second. In supercomputer jargon, that's 41 teraflops -- "tera" for trillion, "flops" for floating-point operations per second.

Thomas Zacharia, head of computing at Oak Ridge, says his Cray hardware should come close to 40 teraflops by the end of this year, jump to 100 teraflops in early 2006, and hit 350 teraflops by late 2007. The result, he says, will be a "major new tool for competitiveness." And unlike Earth Simulator, the Oak Ridge supers will be available to remote users through a high-speed network. "We're already seeing strong interest from the aerospace, auto, and chemical industries," says Zacharia.

Academic scientists who model the birth of stars and the origin of life may have the greatest hunger for supercomputing power. But supercomputers are used in a wide swath of industries, including finance, insurance, semiconductors, and telecommunications. Indeed, roughly half of the world's top 500 supers are owned by corporations.

While the machines used by business today don't have the muscle to tackle the "grand challenge" problems in science, such as predicting climate change, they have become essential in developing better products and speeding them to market. Procter & Gamble Co. (PG ) even engineers the superabsorbent materials in its baby diapers with a supercomputer. Now, IBM (IBM ) and other suppliers are evolving designs that promise a new class of ultrafast supers and innovative software development tools.

The U.S. may need the extra brawn. The power of Japan's Earth Simulator "will contribute to fundamental changes in every field," says Tetsuya Sato, director of the Earth Simulator Center (ESC). The Center is now nailing down a collaboration with Japan's auto makers to harness the super for automotive engineering and simulated crash testing, says Sato. Even before Earth Simulator, there were signs that nec's smaller supers, which weren't available in the U.S. until recently, were delivering potent competitive advantages. For example, many Japanese cars are noticeably quieter than American models. One reason: Japanese carmakers routinely run very sophisticated aerodynamic simulations that show designers where to make subtle refinements in body shape to reduce wind noise.


Earth simulator isn't the only threat. In computational biology -- using software to tackle problems ranging from medical diagnosis to drug discovery -- the U.S. has an even bigger handicap. In 2001, Japan's Institute of Physical & Chemical Research, known as Riken, built a special-purpose computer for such notoriously difficult jobs as simulating the function of proteins. Called the Molecular Dynamics Machine, it has a speed of 78 teraflops -- twice as fast as Earth Simulator.

Why didn't Riken's computer ring alarm bells in Washington in 2001? Because special-purpose computers aren't highly regarded by most analysts -- they lack the versatility of general-purpose machines. Maybe the experts should take another look. In early 2006, Riken expects to unveil a far faster racer. Its cruising speed will be 1,000 teraflops, or 1 petaflops. While its main task will be elucidating the complex structure of proteins, Makoto Taiji, chief of Riken's biocomputing team, believes it will also shine in nanotech: designing materials atom-by-atom. All told, Horst D. Simon, director of the National Energy Research Scientific Computer Center in Berkeley, Calif., figures many American researchers labor under a supercomputing handicap of 10 to 100 times.

There are two basic approaches to supercomputer design. NEC's supers use a so-called vector architecture, meaning they have custom silicon processors for brains (box, page 52). These chips are specifically designed for the heavy-duty math in science and engineering. In contrast, virtually all U.S. supers do their thinking with ordinary microprocessors -- the chips found in PCs and video games. Until Earth Simulator came along, the U.S. was smug about this approach. Because commercial off-the-shelf (COTS) chips are produced in huge volumes, they're much less expensive than NEC's chips. So when more speed is needed, IBM, Hewlett-Packard (HPQ ), or Dell (DELL ) can just "scale up," lashing together 100 or 1,000 more chips -- the "scalar" approach.

However, the peak-speed ratings of COTS clusters can be deceptive. When running the complex software used to tackle really difficult issues in physics, chemistry, and simulated crash tests of cars, COTS systems rarely eke out even 10% of their peak speed over extended periods. NEC's machines chug along at 30% to 60%.

This dramatic difference in so-called sustained performance means that Earth Simulator isn't just twice as speedy as the current U.S. champ -- the ASCI Q from Hewlett-Packard Co. at Los Alamos National Lab, which has a peak speed of 20.5 teraflops. Factor in efficiency, and the Japanese computer can be 6 to 20 times faster, with one test indicating a 36-fold gain. And Earth Simulator pulls this off with fewer processors -- 5,120 vs. the 8,192 chips in ASCI Q.

For most problems in business and nuclear weapons work, the COTS approach works great. But in cutting-edge research, vector software code can be the pacing factor in how long a task takes to finish. That matters because it can determine where new insights first trigger breakthroughs and inventions.

Plowing new ground in science can involve programs that take days or even years just to run -- not counting the time to analyze results and revise the software for the next iteration. Before 1990, scientists trying to understand how proteins function would feed assumptions to a supercomputer, then wait a year or so to see the results. In the late 1990s, teraflops supers slashed each run to a day or two.

With Riken's 78-teraflops super, it's only three or four hours, says researcher Taiji. On Riken's upcoming 1-petaflops brute, it may be just three minutes. A scientist who can screen a different protein model for its pharmaceutical potential every few minutes clearly stands a much better chance of finding the key to a new drug than a researcher whose computer takes half a day or more to spit out each answer.


When it comes to demand for faster computers, there's no end in sight. Probing nature is like peeling an onion: Each successive layer poses harder questions. Today, supers have virtually become the table stakes for scientific discovery. Further progress in science doesn't rest only on the traditional interplay between theory and experiment, says Orbach. Simulations will be the "third leg of the stool," he says. "I'm a theoretical physicist, and there are some problems for which there aren't any theories. You can only understand that science through simulations."

COTS technology was adequate for the "grand-challenge problems that confronted science and engineering in 1995," says Suresh Shukla, manager of high-performance computing at Boeing Co. (BA ). But today's toughest problems, he says, "are not going to get solved efficiently on COTS machines."

However, vector machines such as NEC's are expensive, with prices typically in the tens of millions of dollars. For the budget-minded, COTS is the ticket. Last fall, a cluster nicknamed Big Mac was unwrapped at Virginia Polytechnic Institute & State University. It consists of 1,100 Macintosh G5 computers from Apple Computer Co. Theoretically, it can wolf down 17.6 trillion instructions a second. That's almost 45% of Earth Simulator's capacity -- for 1.3% of its $400 million cost, a paltry $5.2 million.

Clusters deliver more than low prices. What they do for "time to insight," the research counterpart of industry's time to market, is incalculable. With their own little super, engineering and research teams can bid adieu to monthlong delays while waiting for access to a jumbo cruncher at a National Science Foundation supercomputing center. So questions can get answered sooner despite using slower computers.

"There's another time-to-insight metric that needs to be considered: the time it takes to build an application," says William R. Pulleyblank, director of exploratory computer systems at IBM's Research Div. Writing software usually takes a lot longer than running it. If software development is accelerated because researchers can work with the same tools they use to write programs for PCs and workstations, it can offset even major differences in run times.

In terms of the acceptance of COTS technology, the record speaks for itself, says David W. Turek, IBM's vice-president for deep computing. Since 1990, COTS clusters have been shoving Cray-type machines off the Top500 Supercomputer Sites list ( compiled by Jack J. Dongarra, a senior scientist at Oak Ridge. By 1994, clusters accounted for over half of the world's speed demons. Last year, 95% were COTS systems.

"Look," says Turek, "computer architectures are driven fundamentally by economics. If I were to come out tomorrow with a really fast system, but you had to spend $500 million to buy it, there'd be only one or two customers. That's not a business."

Turek has a point, concedes Cray CEO James E. Rottsolk. "This market is not large enough to support the research and development expense that IBM or HP would have to spend" to sustain a separate family of vector chips. Worldwide sales of top-end supers have hovered around $1 billion a year for more than a decade, says International Data Corp.

The Energy Dept. division that supports classified research found COTS irresistible. In 1995, it launched the ASCI program to build a series of progressively faster clusters and leap from gigaflops to teraflops. The program's crowning achievement will be a 100-teraflops behemoth called ASCI Purple. IBM expects it to come online next year at Lawrence Livermore National Lab.

Livermore researchers are itching to see what this Purple number-eater can do. "For the first time," says Mark K. Seager, assistant head of terascale computing, "we should be able to do a 'button-to-boom' simulation of a nuclear explosion." It may take several months to simulate the trillions of interactions that happen in just a few billionths of a second. But that's a breeze compared with running such a simulation on the Cray 1 that Livermore had in the late 1970s. With that super, recalls Seager, "we figured it would take 60,000 years."

Purple's reign could be brief. IBM and Livermore are also building a still beefier machine, Blue Gene/L. When this experimental system springs to life next year, it may spit out 360 teraflops. But Pulleyblank warns that this is by no means certain, because IBM does not have a computer big enough to verify the design of a system as complex as Blue Gene, with 131,072 processors.

If all comes together as hoped, Blue Gene may herald the start of a transition to the next generation of supercomputers -- systems that combine the benefits of both vector and COTS designs. Blue Gene's chips are turbocharged with a special "pipeline" that pretends to be a vector processor.

Cray has developed a similar architecture, dubbed Red Storm, in conjunction with Sandia National Labs. The first such cluster -- a $90 million, 40-teraflops system -- will be switched on at Sandia this summer. And Oak Ridge will deploy a 20-teraflops Red Storm cluster, half of it by late this year.

There is universal agreement that vector processing alone won't suffice for the petaflops computers need to solve the next layer of grand-challenge problems. The software for attacking these issues is mushrooming in size -- and contains increasing amounts of the scalar code that COTS chips handle so effectively. Having both Cray's latest top-line machine, a Cray X1, and a Red Storm system under the same roof at Oak Ridge should provide valuable guidelines, Zacharia believes.

Further out, the Pentagon's Defense Advance Research Projects Agency (DARPA) is sponsoring a contest to develop blueprints for petaflops systems with balanced vector/scalar performance to improve time to insight. Three companies are in the running: Sun Microsystems (SUNW ), IBM, and Cray. They'll deliver digital models of their designs by 2006. Then DARPA will hold a "bake-off" to eliminate at least one. The remaining one or two will get funds to construct a working prototype by 2009.

Does Japan have anything like the DARPA contest under way? "Unfortunately, no," says Earth Simulator's Sato. He has ideas for an optimized architecture that blends vector and scalar features, he adds. "But we haven't got any funds for it." That doesn't mean the U.S. can relax. NEC has a new family of supers up its sleeve that could be announced within the next year. But details are still "a corporate secret," says NEC Vice-President Tadashi Watanabe.

But barring a major surprise from NEC, the U.S. seems sure to regain the lead in supercomputers -- thanks in large measure to Earth Simulator. The wake-up call it delivered helped galvanize a fresh approach to supercomputing. The results promise untold rewards to both industry and science.

By Otis Port in New York, with Hiroko Tashiro in Tokyo

    Before it's here, it's on the Bloomberg Terminal.