Introducing: The Grid

Odds are, somewhere near you right now, computers are whirring all day and night frantically processing the latest ATLAS data. They’re part of a system called the Worldwide LHC Computing Grid, or just “the Grid” for short, and without them we’d drown in the data spat out by our detector. So what is the Grid, exactly, and how do we keep it so busy?

We always try to have a detailed simulation of the physics we’re trying to understand. This year, that means simulating about two billion proton collisions in ATLAS, which, if you were to start that on your laptop right now, would take somewhere around 15,000 years to finish. We don’t have that kind of time! Enter the Grid. The Grid ties together something like 100,000 laptop-equivalents sitting at universities and labs all around the world. We can ship chunks of data from our detector over to one of those computers, set it to work, and tell it to send us a message once it’s ready for the next chunk. It’s a kind of massive parallel-processing that allows us to churn through our data at a much higher rate than would ever be possible if we only used the computing at CERN.

We’ve come a long way in the last 30 years. For the discovery of the W boson at CERN in 1983, a picture of every event was sent to the “Megatek” facility in Switzerland, where one-by-one a person analyzed them by eye (really!). The Tevatron experiments need significant computing resources, but most of it can be hosted on-site at Fermilab near Chicago. Today, the LHC experiments are really the first physics experiments (that I know of!) that desperately need the Grid to stay operational. We could put the entire city of Geneva to work, and each person would have to process three collision events an hour, 24 hours a day, 365 days a year, to keep up with all our recorded and simulated events! The Grid makes that task much more manageable – but not easy, by any stretch of the imagination.

There are people around the world constantly working to take care of the Grid and keep it running smoothly for us. One year’s worth of operation costs around 15M Euros, give or take a bit, and we want to make sure we (and the taxpayers) get our money’s worth. About half of that goes straight to keeping the computers cool – without a fancy data center on the scale that Google and Apple build, it can be expensive keeping that many machines from overheating! No computer is retired if it can be upgraded or repaired cost-effectively. And we constantly work on improving our software, knowing that if we can make it 10 percent faster we can save over 1M euros. Well, that’s only half-true, of course; if our software runs faster, we’ll use the spare time to write even more papers!

Despite all this fancy computing, we aren’t anywhere close to developing SkyNet – don’t worry. Apple’s and Google’s new data centers, which reportedly run about $1B USD each, have computing resources that are 10-100 times as much as our entire beloved Grid. Of course, we don’t have that kind of money! But some times, late at night, waiting for a computer hundreds or even thousands of miles away to phone home and tell me it’s done with its little job for the day, I dream of all we could do with just one of those buildings…

Zach Marshall

 

Zach Marshall is a research fellow at CERN. Over the last five years he has alternated between developing software for ATLAS and abusing that software for the good of physics analyses.

5 Comments

  • By Richard Mitnick, December 8, 2011 @ 2:36 am

    The mighty men and women in the world of Public Distributed Computing – on BOINC software from UC Berkeley – stand ready willing and able to help crunch some of that data. BOINC today does some 6 PetaFLOPS per 24 hour period. We are a virtual cupercomputer.

    We already have two projects running for the LHC, “Sixtrack” (LHC@home 1.0) for magnet tuning; and LHC@home 2.0 for event collisions. We would love to do more.

  • By Chris C., December 9, 2011 @ 6:26 am

    … and an easy way to give us more to do would be to keep your existing LHC@home pipelines full of work unit. I’ve got LHC in my BOINC setup (on several computers) but am usually disappointed to see that it has nothing from LHC to work on. Help us help you!

  • By Hamish Johnston, December 9, 2011 @ 11:13 am

    It’s lamentable that CERN is no longer at the cutting edge of computing — with Google and Apple (and I’m guessing Amazon,Walmart,Tesco and other multinationals) having larger and more sophisticated data processing operations. It makes it just a little more difficult to cite technological spinoffs (such as the Worldwide Web) to justify the great expense on facilities like the LHC.

    I saw this coming in 1998, when I went to CERN expecting to hear about all the exciting new computing that was being devleoped for the LHC. What I heard instead was that CERN’s computer boffins were “tracking” technology developments in Silicon Valley so they could assemble their systems from off-the-shelf hardware and software.

  • By Richard Mitnick, December 9, 2011 @ 9:48 pm

    Yea Chris C.!!

  • By Zach, December 11, 2011 @ 2:13 pm

    @Chris and Richard: We generally haven’t gone for LHC@Home-style processing because there are concerns about information protection. We often need the jobs to connect to secure servers to get some information (which we don’t want to open up too much). It wouldn’t be easy, it’s true, but someone could try to mine our data without our permission. And there are issues with jobs connecting to databases (many of which have been solved, but some of which are still a bit tricky) that are sufficient that I would fear something like a clobbered Oracle server killing all the jobs on the grid. So as far as I know, LHC@home does not process data for either ATLAS or CMS. I’m leaving out the fact that our software often requires 20-50 GBs of disk space, 3+ GB of RAM, etc, and that in this time-sensitive period we need to be sure that when we send a job out, it comes back in a reasonable period of time (and doesn’t fall into a hole because a user’s computer agrees to process it and just doesn’t get to it for a while). In the future, some or even all of those concerns might be solved… we’ll have to see!

    @Hamish: I agree, it’s a shame we’ve fallen behind – but once a company can pay for more technology, industry will quickly pass academia :-) I think that’s the part you’re missing, though. We need academics and labs to *create* the technologies and develop them to the point that they can be used by those large companies. Then, of course, if someone pays $10B USD, they’ll be able to buy all they want. And while they’re doing that, we’ll be developing the next technology that 10 or 15 years from now Google will want to buy lots of.

    And of course when we need large-scale purchases, it’ll be cheaper to follow silicon valley. We don’t develop new generic chips any more – it’s better to leave that to Intel and AMD. But we do develop custom chips that others want to use. In other words, if it’s cheaper to rely on the market (for CPU), then we do that. But if the only way to get what we want is to BE the market, we do that as well! That was the case with the magnets developed for the LHC – we had a significant impact on the global market for superconductors by demanding and developing tech that no one else had thought about yet.

    Thanks for the comments everybody!

Other Links to this Post

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by sweet Captcha