Ever since the 1970s, when Intel founder Gordon Moore first suggested that computing power would double every 18 months, the major semiconductor companies have been making sure he's been right. For chipmakers like Intel, Motorola and AMD, who pump faster and faster processors into our PCs each year, Moore's Law is the pride of their industry.
Unfortunately for them, the rest of us probably stopped caring a couple of years ago. Any CPU we buy today is likely to give us more than enough speed and power for the things we do most: e-mail, games, maybe a little light typing. Though the latest designs may indeed be innovative, for the consumer market, the thrill is gone.
Not so for the high-end computing-applications market, however. While modern commodity PC hardware may be good enough to power even high-traffic Web sites like Google, the really serious number crunching still calls for big iron. According to IBM, sales for its mainframe products have remained relatively strong despite the current economic recession.
But sometimes even mainframe power isn't enough. After all, Moore's Law can be seen as a limitation rather than a prophecy. It doesn't allow for the fact that computing power can increase only as fast as the speed of its manufacturing process allows. But suppose you needed more — and not just a little bit. Suppose you needed a thousand times more computing power than any machine ever produced could provide. What would you do then?
When the World Collides
That's exactly the question CERN, the European Organization for Nuclear Research, found itself asking last September. Over the years, the scientists at CERN's Geneva headquarters have dreamed up some of the most complex scientific-computing applications in the world. Trying to show off Intel's latest Pentium CPUs to one of these guys won't elicit much more than a chuckle.
The latest, and probably the greatest, project under way in Geneva is the construction of the Large Hadron Supercollider, touted by CERN scientists as the most powerful device of its kind ever devised — and they ought to know. Previously, the title of World's Greatest Atom Smasher was held by CERN's Large Electron Positron collider, which was dismantled to make way for the LHC.
The problem with scientific instruments of the magnitude and precision of CERN's particle accelerators, however, is that they produce a lot of analytical data — and I mean a lot. Once it becomes operational in 2006, the LHC will begin generating more than 10 million gigabytes of data each year. To give you an idea of that magnitude, that's equal to about 20 million CD-ROMs, or enough to hold every word in the Library of Congress 10,000 times over.
In the sciences, there's no such thing as too much hard data. But given datasets of such immense size, how on earth do you manage it all? By CERN's estimates, it couldn't. Processing the LHC's 10 petabytes of data per year was beyond any computer CERN scientists had at their disposal. In fact, such a feat was well beyond the capabilities of any supercomputer ever built — or even that could be built.
The raw data gathered by the LHC wouldn't be much use without computers to interpret the results of its particle collisions. That's why last year CERN adopted a new motto: "A thousand times more computing power by 2006." But to get there, research labs around the world would have to put their heads together — literally.
Grid Iron
The solution CERN proposed was what became known as the European DataGrid, an ambitious project based on an emerging distributed-processing technology known as grid computing. Instead of relying on mainframe makers like IBM to produce ever more powerful boxes, grid computing lets scientists achieve the same effect by combining the computational power of separate machines — in this case, several thousand machines.
Grid computing takes its name from the electrical power grid, which was designed with both collaboration and flexibility in mind. The energy utilities designed the grid so that a number of generator facilities can pool their resources to provide electricity to a large area, like a city. Should any of the generators go offline or experience a sudden spike in demand, utilities can "borrow" power from other, more remote facilities to make up the difference.
Grid computing works in much the same way. At the local level, scientists can link their machines into "fabrics" of interconnected computers. When one machine becomes bogged down with particularly thorny calculations, it can request additional processing power from the others. In a global grid environment like the DataGrid, however, these fabrics can in turn be connected to form a larger fabric of labs and data centers that can span continents. Potentially, all these systems can then operate on a single problem like a vast, virtual supercomputer.
In the time since the first grid-computing experiments were conducted in 1995, a number of prominent projects have sprung up in academic environments around the world, including Asia and the United States. The European DataGrid will have to draw on the experience and expertise of all of these pioneering researchers if it is to achieve its goal of full operation by 2006.
Heavy Mental
The research labs of CERN are perhaps the ideal home for a project as complex as, and with as many challenges as, the DataGrid. For years, CERN scientists have wrangled not only with thorny problems of particle physics but with the computational problems the sciences face as well.
For example, in 1990 a CERN researcher named Tim Berners-Lee began work on an idea of how to connect far-flung research labs to form a single, global information resource. At that time, labs around the world were already producing volumes of data documenting their research, but there was no easy way for the scientists at one facility to reference the work of those at another.
Berners-Lee reasoned that in order to make the volumes of accumulated information accessible, it would be necessary to create a means of linking documents together. He based his idea on "hypertext" invented by Theodor Nelson, and the system he devised for linking hypertext documents across the Internet became known as the World Wide Web. A few years later, Marc Andreessen would create a graphical browser for accessing Web documents called first Mosaic and then Netscape — and the rest is history.
Today, CERN is rightfully proud of Berners-Lee's achievement. Moreover, it hopes that the world's scientific and business communities will recognize its foresight in sponsoring the research that led to the Web as evidence of the value of its ongoing computing projects. So far, so good: The DataGrid effort was officially launched in January 2001 with a sponsorship grant from the European Union to the tune of 9.8 million euros, and further contributions are forthcoming from CERN's various member nations.
Business Brains
Such generous support should probably be expected. After all, the hope is that the DataGrid will benefit a broad range of scientific research, including not just nuclear physics but biology and Earth observation as well. What's more, European countries are betting that grid technologies like those used in the CERN project could be put to use in the private sector as well.
Though it seems unlikely that any business will generate as much raw data as CERN's Large Hadron Collider, the sciences aren't the only area where there's demand for high-performance supercomputing. Financial-services companies, insurance agencies and engineers could all potentially take advantage of grid-computing power for statistical-modeling and data-mining applications.
So forget Moore's Law. The problem isn't that computers are as powerful as you'll ever need. It's just that you've been thinking too small. Tomorrow's supercomputers may be so large, in fact, that they cover the whole world. And to see them in action, you need look no further than 2006.