A Better Web Through Higher Math

Concepts from graph theory may hold the key for everything from delivering Internet content more quickly to tracing hack attacks

By Stephen H. Wildstrom

The surest way to annoy a research mathematician is to ask what his or her research is good for. Mathematicians take pride in advancing knowledge for its own sake. Their disdain was summarized by the British number theorist G.H. Hardy, who wrote: "A science is said to be useful if its development tends to accentuate the existing inequalities in the distribution of wealth, or more directly promotes the destruction of human life."

Yet the lure of analyzing, deconstructing, and perhaps improving something as practical as the Internet is proving to be irresistible to a number of mathematicians and theoretical computer scientists. The recent Joint Mathematics Meeting in San Diego included three sessions on "probabilistic methods in combinatorics and the Internet," at which 21 papers were presented, with titles ranging from On Playing Golf With Two Balls (a discussion of a technique that could lead to more efficient delivery of Internet content) to Approximating the Domatic Number (don't ask me to explain).

Deconstructing and quantifying the Internet isn't new as a mathematical challenge. In the mid-1990s, two Stanford graduate students, Sergey Brin and Lawrence Page, thought that a mathematical study of the relationships among Web sites could produce much more efficient searches than the brute-force indexing techniques then being used. Their studies led to the creation of Google, by far the most effective commercial search tool available.


  About the same time, Massachusetts Institute of Technology mathematician Tom Leighton and his graduate student, Daniel Lewin (who died Sept. 11 aboard one of the planes that hit the World Trade Center), were tinkering with mathematical methods to improve the delivery of Web content by automatically sharing the load among multiple servers scattered around the Net. Their work became Akamai Technologies, which serves the content of many of the Web's busiest sites.

Perhaps inspired by these successes, or perhaps just for the sheer fun of it, interest in the field seems to be rising. The most common approach is to consider the World Wide Web as a graph. In mathematical terms, a graph has nothing to do with the familiar stock market fever chart or graphical representations of data. Instead, it is simply a collection of points, called vertices, and lines joining them, called edges.

This turns out to be an ideal way to model the Web: Each page is considered a vertex, each hyperlink an edge. It works just as well for the physical Internet: In this case, every router is a vertex and each communications channel an edge.


  Over the past two hundred years, a body of knowledge called graph theory has developed to analyze the characteristics and behavior of graphs, including huge ones with millions of edges and vertices. These techniques are particularly useful in working with what are called sparse graphs, that is, those where only a small percentage of the possible edges actually exist. Again, this is a good description of both the Web and the Internet.

Explorations of the Web as a graph include efforts to simulate the growth of the network and to understand how communities of interest on the Web originate, grow, and decay. The Clever Project at IBM's Almaden Research Center in San Jose, Calif., seeks to build better search tools by using the techniques of graph theory to identify the most authoritative sites on a given subject.

Graph theory, however, is not the only area of mathematics being brought to bear to improve the Internet. In a paper presented in San Diego, University of Massachusetts-Amherst computer scientist Micah Adler turned his attention to denial-of-service attacks that have effectively shut down popular Web sites by overwhelming them with garbage traffic.


  One of the characteristics of the Internet is that it is "stateless": The routers that zip information along sometimes convoluted paths from source to destination maintain no records of their actions. This makes it effectively impossible to trace any given packet of data, or even a flood of packets, back to the source.

In a paper entitled Tradeoffs in Probabilistic Packet Marking for IP Traceback, Adler showed that adding a tiny amount of information to the header, which provides the destination address and other data for each packet, would also provide the information needed to trace the path of the packet in a way that no hacker could forge.

In theory, a single bit, randomly set to one or zero with a specific probability of which routers would be traversed along the way, could provide all the information needed to trace a message back to the source with a high degree of probability. In reality, this is impractical because of the vast number of packets that would have to be collected and analyzed. Just a few bits of information, however, would make the job possible.


 One of the beauties of research in theoretical mathematics is that even the researchers generally have no idea of what will turn out to be practical. More than 150 years after the 21-year-old Evariste Galois died in a duel, the theory that bears his name proved critical to the development of error-correcting codes that make compact disks reliable and practical. Hardy believed that the real glory in number theory was its lack of practical applications. He died in 1947, before his chosen field became the basis of modern cryptography.

So before you dismiss a paper with a title such as A Phase Transition in the Optimum Partitioning Problem as irrelevant gobbledygook, consider that the work was done at Microsoft Research -- and could lead to the software revolutions of tomorrow.

D. Jacob Wildstrom, an MIT math student specializing in graph theory, provided technical assistance for this article

Wildstrom is Technology & You columnist for BusinessWeek. Follow his Flash Product Reviews, only on BusinessWeek Online

Edited by Douglas Harbrecht

Before it's here, it's on the Bloomberg Terminal.