Beyond The Genome: Biotech's Next Holy Grail

Now, companies are racing to decipher the human protein set

This summer, researchers will complete one of the greatest biological endeavors since Charles Darwin made his voyage of discovery on the HMS Beagle. Scientists will announce the completion of a rough draft of the human genome--a first attempt at decoding the entire set of about 100,000 human genes. Champagne corks will pop, backs will be slapped, and biologists will mark the start of a revolutionary era in biology.

After the first round of self-congratulation, however, researchers will find that their task has only just begun. The decoding of the human genome, as impressive as it is, means little until researchers find out what all those genes do. Sequencing the genome is just the first step in the long march toward an understanding of how humans grow and develop and how they get sick. "Genes are powerful, but at the endyou want to get at the proteins" that carry out bodily functions, says Dr. Randall W. Scott, chief scientific officer at Incyte Genomics Corp.

At least seven biotech companies are racing to develop new tools to mine the proteome--the complete set of human proteins--for clues to human disease. The tools include powerful instruments that can sort and sequence thousands of proteins simultaneously. Even gene wizard J. Craig Venter--whose company, Celera Genomics Group, helped galvanize the human genome project--is jumping on the proteomics bandwagon. "We are going to build the Celera proteomics facility. It's a major, major effort," says Venter.

Academic research centers and pharmaceutical companies are using proteomics to identify diagnostic indicators for a range of diseases, from breast cancer to heart disease to Alzheimer's. The National Cancer Institute and the Food & Drug Administration are funding a multimillion dollar "Tissue Proteomics Initiative" to identify proteins linked to early stages of colon, breast, and other cancers. Meanwhile, drug companies, including Bayer, Merck, and Pfizer, are starting collaborations on proteomic projects to complement their other research efforts.

Most genes serve as blueprints for the production of proteins, the workhorses of living cells. To understand disease, doctors need to survey the activities and functions of these proteins. Thus, genomics--the study of the genome--will soon give way to a far more challenging discipline: proteomics, or the study of proteins. And the human genome project may soon be replaced by a human proteome project, which would enable scientists to answer many fundamental biological questions.

Many critical functions of the cell are accomplished by a complex cascade of events in which one protein acts on another, which acts on another, and so on. Examples of the process include, say, the changes that occur when a neurotransmitter is dispatched from one brain cell to another. Understanding of these processes could be used to develop better drugs for cancer and diabetes. It should also help identify patients most likely to suffer side effects.

Even with the new genomics technologies, it takes 10 or 15 years to get a drug out of the lab--to evaluate it in animal and human trials and get it onto the market. Proteomics promises to shorten that to perhaps a few years by helping researchers identify safer, more effective candidate medicines earlier in the drug-development process. "It will go so far beyond the genome project, we'll be delighted," predicts Arnold Oliphant, vice-president for functional genomics at Salt Lake City biotech company Myriad Genetics Inc.

This eagerness to understand the proteome isn't new. In 1980, scientists proposed a project called the Human Protein Index, which Congress seriously considered funding. But before the program got off the ground, the political tide turned in favor of a human genome project. The prevailing attitude was, it's so much easier to work with DNA, we shouldn't waste time fooling with proteins.

Compared with proteins, DNA molecules are simple. A DNA molecule is a long, spiraling ladder--the famous double helix--composed of just four basic constituents. Proteins, on the other hand, are highly complicated beasts that fold up into intricate and often unpredictable shapes. They are built from 20 building blocks, called amino acids, each of which has its own unique chemical properties. That alone makes sequencing proteins a much harder task. In addition, the set of proteins a cell uses is constantly changing. Some proteins are broken down and their components recycled in minutes, while others can persist in the cell for hours or even days. Their surfaces are constantly modified, too--a sugar molecule might be added to one, a phosphate molecule tacked on to another. These additions can activate or deactivate proteins. Scientists estimate that the 100,000 genes in the human genome can generate perhaps as many as a million different proteins.

The completion of the human genome project will speed proteomics research. "Without genomics, proteomics is so much harder," says Keith Williams, chief executive at Proteome Systems Ltd., a proteomics company based in Sydney, Australia. But even armed with critical genetic data, researchers need to develop instruments powerful and sensitive enough to detect and characterize extremely small quantities of a protein--as little as a billionth of a gram. That's because many of the most important proteins are present in only fleeting amounts. The mighty gene sequencers used by Celera, developed by PE Biosystems in the mid-1990s, ensured that gene sequencers had the capacity to tackle 100,000 genes. Those instruments are one of the reasons the sequence of the human genome will be finished five years ahead of schedule.

With potentially a million proteins to unravel, even more powerful technologies are needed for proteomics. Those instruments are coming. Already, Oxford GlycoSciences in Oxford, England, Large Scale Biology in Rockville, Md, and Williams' Proteome Systems have developed sophisticated, reliable methods of miniaturizing the sheets of jello-like polymer that are used to separate individual proteins from the many thousands contained in a serum or urine sample. They have even found ways to sequence the 20 different building blocks rapidly but accurately.

CHURNING OUT DATA. The current separation and sequencing tools may not be as fast or as automated as researchers would like, but they are churning out more information about novel proteins than the companies' computer systems can handle. That has intrigued pharmaceutical companies, which are always hungry for new drug targets. Large Scale Biology has signed contracts with 24 pharmaceutical and biotech companies, while competitor Oxford Glycosciences has done at least six deals. Proteome Systems has orchestrated its own share of contracts, including one with Indianapolis-based Dow AgroSciences to list and classify all of the proteins made by various crop plants.

But cataloging proteins is only one way to harness the power of proteomics. Pharmaceutical companies really want to understand the protein changes that occur when a normal cell becomes diseased. Pfizer Inc. has teamed with Oxford GlycoSciences PLC to use proteomics to unearth biological markers that define the various stages of Alzheimer's disease. In practice, this entails analyzing hundreds of samples of spinal fluid taken from patients with mild, moderate, and severe forms of Alzheimer's and comparing them with samples from healthy people (diagram).

The goal is to find key differences among the samples and use that knowledge to chart the progression of the disease at the cellular level. Right now, to diagnose Alzheimer's conclusively, doctors must conduct a lengthy series of memory tests. But if doctors could use proteomics to find a unique set of proteins that distinguishes the early stages of the illness from later ones, or even from no disease at all, then they would have a rigorous and objective means of diagnosis. For instance, patients with proteins A and B might have a case of Alzheimer's with only mild memory loss. Patients with proteins X and Z, however, might have such a severe form of the disease that they need special treatment. Knowing that information would allow physicians to correlate subjective symptoms with real biological changes, says proteomics proponent Dr. B. Michael Silber, the director of pharmacogenomics and clinical biochemical measurements at Pfizer.

Cancer is another area where researchers believe proteomics will yield big payoffs. In 1998, Dr. Emanuel F. Petricoin, a senior fellow at the FDA, and Dr. Lance A. Liotta of the NCI sponsored a program designed to identify protein markers that are specifically linked to early onset of ovarian, prostate, and other cancers. The hope is that these early markers will lead to better diagnostic tests that can identify lesions at the pre-malignant state, when they are just beginning to grow and spread throughout the body. So far, the initiative has uncovered nearly three dozen new protein markers that are consistently seen in pre-malignant cells, including several new markers--in addition to the already widely used PSA--for one of the most common cancers, prostate cancer.

Proteomics will also give medical researchers important information about the potential side effects of new and existing drugs. "Right now, one of the main that many drugs fail at late stages of clinical development because of unforeseen toxicities," says Petricoin. He believes that proteomics could be used to eliminate potentially life-threatening compounds before pharmaceutical companies invest tens of millions of dollars bringing them to market. Scientists might, for instance, test how different experimental drugs affect the proteins of a given tissue--say, the liver or the kidney. The drugs that produce the fewest changes are likely to be safest, says Petricoin.

MANAGEABLE CHUNKS. Because it is difficult to coordinate a massive effort to physically isolate and measure all of the proteins in a cell or tissue, some companies, including Curagen, Myriad Genetics, and Hybrigenics, have opted for a simpler approach. They are fishing out specific groups of new proteins related to, say, insulin levels, and then determining the proteins' functions. That effort can have practical and beneficial outcomes. In a study with a pharmaceutical company, Myriad scientists took 10 proteins known to be involved in a particular disease and used them to uncover 200 additional proteins--and one of them looked like a promising drug target. In just a few months of work, Myriad scientists were able to provide the pharmaceutical company not only with critical biological information but also with valuable clues about a potential blockbuster medicine. "That is phenomenal speed to identify a new drug target," says Oliphant.

Researchers at biotech companies and universities have clearly made a good start at unfolding the secrets hidden in the proteome. But there are still many hurdles to clear. Denis F. Hochstrasser, a Swiss chemist who helped found proteomics company GeneBio, cautions that the proteome is so difficult to analyze "that no one technology is sensitive enough [yet] to find the needle in the haystack. It's going to require many years and many new innovations" to crack the protein code (sidebar).

It's also going to take massive amounts of computing power and storage capacity. "If you look at the complex network of [protein information] we will build on top of the will be in the petabyte range," says Celera's Venter. (A petabyte is a billion megabytes.)

These are technology problems that can and will be solved. Proteomics may be in its infancy, but already, a new post-genomic age has dawned.