The European Organization for Nuclear Research (CERN) is on its way to switch on the world's largest particle accelerator, the Large Hadron Collider in August. A super-high-bandwidth network will transfer data from the Large Hadron Collider Computing Grid (LCG) to 500 institutions around the world. About 5,000 scientists will have about 15 petabytes (15 million gigabytes) of data coming their way every year for at least ten years.
The particle accelerator's job will be smashing protons, subatomic particles, into each other. The speed of these particles will be somewhere at about 99 percent of light speed, and the amount of energy and particles sprayed into its detectors will be huge. Researchers will benefit from the distributed processing power of almost 100,000 CPUs through the use of LCG, allowing them to analyze data from the detectors and seek for clues about the universe's fundamental nature.
Andrew Sansum, tier one manager at RAL (Rutherford Appleton Laboratories, near Oxford, England), said that the10-gigabit connection the labs have to CERN is capable of 1,250 megabits per second upstream and downstream. That connection is about 1,000 times faster than a home broadband connection's registered download speeds.
Commercial networks will have a lot to catch up here, and only in about two decades will they be able to come to similar speeds. "Video and other media services are going to push the speed of consumer network connections up as the demand is going to be huge," Sansum said. "We were at today's speed of about 10Mbps about 10 to 15 years ago, so you could take that as a precedent for how long it will take for the commercial networks to catch up with us today."
"Tier one" sites connected to LCG across the globe, like RAL, will have to get the huge amount of data from LHC and shape it into chunks. These will go to physicists to be analyzed and then passed to hundreds of "tier two" universities and laboratories in their respective countries.
The LCG represents a challenge for grid technologies
"The LHC experiment would not be possible without the power and throughput of the LCG. CERN has not got the capacity to solely process the vast amount of data on site. The tier one sites will be busy refining the data and enhancing the software that analyses it, growing the processing operations of the grid," Sansum explained.
"Our role is to make sure that those physicists are getting the most useful and relevant data. Grid technology is transforming the way that experiments are being carried out. Ten years ago these institutions were working on their own; now they work closely together." According to Sansum, RAL and the GridPP are prepared for the LHC going live. "We have run it up to 250Mbps to 300Mbps each way sustained over several days so far. We are in the final shakedown at the moment and seem to be in good shape to face the challenges the LHC will throw at us," he added.
Surprises are bound to appear at every corner: "The biggest challenge is for the software to work out which of the 200 or so tier two sites has which data. You need to be able to move vast amounts of data from site to site, check it has all got there, flag up any problems and correct those immediately--it quickly gets immensely complicated," Sansum revealed.
For the downtime of the GridPP, numerous projects are on paper, to benefit from the vast number-crunching capabilities and fat pipes. Projects include searching for ant malarial drugs, combating avian flu, or using an image search engine. Other projects include the analysis of weather data, collaborating on other scientific and academic projects, but none rise to the scale and sustained processing flow of the LCG.
Sansum also stated that grid technologies are expected to grow in use. They will link up different types of data, such as climate information and localized cancer rates, and will offer new ways for the scientific progress to be forwarded.