HP Builds Supercomputer from Off-the-Shelf Parts
October 2001
|
|
It's not quite a do-it-yourself project, but a team of scientists
from HP Labs Grenoble and a French national laboratory recently
built one of the world's fastest supercomputers using nothing but
hardware you might find inside any typical big business.
The I-Cluster is the first TOP500 class supercomputer ever constructed
from unmodified, mainstream PCs. The Top500 list consistes of the
500 most powerful, commercially available computer systems.
Using the standard Linpack
benchmark, the Linux-powered cluster of 225 PCs has
been ranked the 385th-fastest supercomputer in the world and
the15th-fastest in France -- much to the surprise of many
experts.
Harnessing idle computing power
"People couldn't believe we could reach such a performance level
with such a basic hardware platform," said Bruno Richard, project
manager in HP Labs Grenoble. "Ultimately the TOP500 benchmark proved
them wrong -- mainstream clusters can provide high performance."
The HP Labs scientists and a team from France's National
Institute for Research in Computer Science (INRIA) began work on the
project in September 2000. The goal: to develop tools that
transparently harness idle computing power to use for
compute-intensive services.
Standard, unmodified PCs
They started in September 2000 with 100 of HP's simplified e-PCs,
using them to model an enterprise network and evaluate the capabilities
of such hardware for compute-intensive services. Each of these PCs were
what you would then have expected to find on a typical user's desk:
733 MHz Pentium III, 256 MB of RAM, 15 GB HDD. The interconnection
between the machines was a switched 100 Mbps Ethernet.
"These really are standard machines. We didn't even open the
box," Richard said.
This sets the I-Cluster apart from other clusters, which are
composed of heavily modified parts.
Putting together the I-Cluster
Researchers found the HP e-PC ideal for such a large cluster because
it can be easily managed and serviced. The small size of e-PCs makes
it easy to integrate them into racks, and their low power consumption
leads to lower air conditioning requirements and to a low noise
level.
The scientists chose the Linux OS because it is open
and easier to analyze or adapt to specific requirements of the
project.
As the first evaluations showed the cluster to be very scalable
(performance linear with the number of machines) HP Labs Grenoble
bought 125 more machines in February 2001 to bring their cluster
to 225 nodes.
Team developed tools
The team faced a number of challenges in scaling their project
to 225 machines. (INRIA's previous cluster was 12 PCs). Among other
things, they had to figure out how to how to quickly deploy the
operating system on all 225 machines. To do so, they developed tools
capable of installing software, broadcasting files and launching
processes on all 225 machines in about the same time it would take
to perform these operations on a single PC.
The researchers plan to release their software, which they call
Ka Clustering Tools,
as open source.
In addition to heavy computing activity, supercomputing involves
a lot of inter-node communication (in this case, communication among
the PCs). Researchers found that a key factor in reaching peak performance
was to identify and optimize network latency.
Practical applications
About 60 research teams worldwide are working on I-Cluster, with
half running typical supercomputing applications and the other half
doing evaluations of new technology related to large PC-based clusters.
Current research focuses on more practical applications of the
I-Cluster demonstrations. In particular, the research team imagines
"clouds" of devices that would discover network resources, and use
them for compute-intensive services while the devices are idle.
Researchers are now working to provide a correct model of the machines
and of the network topology and load. The link from the current
I-Cluster platform to a "cloud" of devices will also require that
the technology adapts to real-world issues such as the user prevalence,
and a scalability that will reach thousands of devices and more.
|