From 0-60 in 10 million seconds! – Part 2

This is continuing from the previous post (http://pdg3.lbl.gov/atlasblog/?p=1071), where I discussed how we convert data collected by ATLAS into usable objects. Here I explain the steps to get a Physics result.

I can now use our data sample to prove/disprove the predictions of Supersymmetry (SUSY), string theory or what have you. What steps do I follow? Well, I have to understand the predictions of this theory; is it saying that there will be multiple muons in an event or there will be only one very energetic jet in the event, etc? For instance, the accompanying figure shows the production and decay of SUSY particles, which lead to events with many energetic jets, a muon, and particles that escape the detector without leaving a trace (missing energy), like X1.

Cartoon of the production and decay of SUSY particles

If the signature is unique, then my life is considerably simpler; essentially, I will write some software to go through each event and pick out those that match the prediction (you can think of this as finding the proverbial (metal) needle in a haystack). If the signal I am searching for is not very unique, then I have to be much cleverer (think of this as looking for a fat, wooden needle in a haystack).

First, I have to decide the selection criteria, e.g., I want one muon with momentum greater than, say, 100 GeV/c, or one electron and exactly two jets, etc. Once I’ve decided the selection criteria, I cannot change them, and have to accept the results, whatever they may be. Otherwise, there is a very real danger of biasing the result. To decide these selection criteria, I may look at simulation, i.e., fake data, and/or sacrifice a small portion of real data to do my studies on. With these criteria, I could have a non-zero number of candidate events, or zero events.

In either case, I have to estimate how many events I expect to see due to garden-variety physics effects, which can occur as much as a million or a billion times more frequently, and may produce a similar signature; this is called background. This can happen because our reconstruction software could mis-identify a pion as a muon, or make a wrong measurement of an electron’s energy, or if we produce enough of these garden-variety events a few of them (out in the “tails”) may look like new physics. So I have to think of all the standard processes that can mimic what I am searching for. One way to do this is to run my analysis software on simulated events; since we know what a garden-variety process looks like, we generate tons of fake data and see if some events look like the new effect that I am looking for. I can also use “real” data, and by applying a different set of selection criteria, come up with what we call “data driven background estimate”. If the background estimate is much less than the number of candidate signal events, excitement mounts, and the result pops up on the collaboration’s radar screen.

There is usually a trade-off between increasing the efficiency of finding signal events and reducing background. If you use loose selection criteria, you expect to find more signal events, i.e., increase in efficiency, but also more background. Since the background can overwhelm the signal, one has to be careful. Conversely, if you choose very strict criteria, you could have zero background, but also zero signal efficiency – not very useful!!

There is one more thing that I need to do, which sometimes can take a while, and for which there is definitely no standard prescription. I need to determine systematic uncertainties, i.e., an error estimate for my methodology, on both the signal efficiency, and on the background estimate. For instance, if I use a meter-scale to measure the length of a table, how do I know the meter-scale is correct? I have to quantify the correctness of the meter-scale. A result in our field has to have systematic uncertainties otherwise it is meaningless. This step is usually a source of lot of arguments. For instance, in the paper mentioned in Part 1 (http://arxiv.org/pdf/1110.6191v2.pdf), we say that there is a systematic uncertainty of 6.6% (see section 6). Depending on whether this is smaller (larger) than the statistical uncertainty, we say that the result is statistics (systematics) limited. In the first case, adding more data is necessary, and in the second case, a better understanding is needed. At times, one can have a statistical fluctuation that disappears by adding more data; conversely, many results go by the wayside because of people not understanding systematic effects.

Since there is no fixed recipe to do analysis, I can sometimes run into obstacles, or my results may look “strange”; I then have to step back and think about what is going on. After I get some preliminary results I have to convince my colleagues that they are valid; this involves giving regular progress reports within the analysis group. This is followed by a detailed note, which is reviewed by an internal committee appointed by the experiment’s Publication Committee and/or the Physics Coordinator. If I pass this hurdle, the note is released to the entire collaboration for further review. All along this process, people ask me to do all sorts of checks, or tell me that I am completely wrong, or whatever. Given that every physicist thinks that he/she is smarter than the next, this process can be cantankerous at times, since I have to respond to and satisfy each and every comment. Once the experiment’s leader signs off on the paper, we submit it to a peer-reviewed journal, where the external referee(s) can make you jump through hoops; sometimes their objections are valid, sometimes not. I have been on both sides of this process. Needless to say, as a referee my objections are always valid!!

Depending on the complexity of the analysis, the time from the start to finish can be anywhere from a few months to a year or more (causing a few more grey hair, or in my case a few less hair). The two papers that I mentioned at the start of part 1 took about 1-2 years each. Luckily, I had collaborators and we divided up the work among ourselves, so I could work on both of them in parallel.

 Vivek Jain is a Scientist at Indiana University, Bloomington. His current interests range from understanding various aspects of tracking to R-parity violating Supersymmetry. More information about his interests can be found at http://www.indiana.edu/~iubphys/faculty/jain2.shtml

From 0-60 in 10 million seconds! – Part 1

OK, so I’ll try to give a flavour of how the data that we collect gets turned into a published result. As the title indicates, it takes a while! The post got very long, so I have split it in two parts. The first will talk about reconstructing data, and the second will explain the analysis stage.

I just finished working on two papers, which have now been published, one in the Journal of Instrumentation, and the other in Physics Letters B. You can see them here (http://arxiv.org/abs/1110.6191 and http://arxiv.org/abs/1109.2242). By the way, some of the posts I am linking to are from two to three years ago, so the wording may be dated, but the explanations are still correct.

When an experiment first turns on this process is longer than when it has been running for a while, since it takes time to understand how the detector is behaving. It also depends on the complexity of the analysis one is doing. To be familiar with some of the terms I mention below, you should take the online tour of the ATLAS experiment at http://atlas.ch/etours_exper/index.html; slides 7 and 8 will give you an overview of how different particle species are detected and what the various sub-systems look like. For more details you should go take the whole tour; it is meant for non-scientists.

For each event, data recorded by ATLAS is basically a stream of bytes indicating whether a particular sensor was hit in the tracking detectors or the amount of energy deposited in the calorimeter or the location of a hit in the muon system, etc. Each event is then processed through the reconstruction software. This figure gives you an idea of how different particle species leave a signal in ATLAS.

Signals left behind by different particle species

For instance, the part of the software that deals with the tracking detectors will find hits that could be due to a charged particle like a pion or a muon or an electron; in a typical event there may be 100 or more such particles, mostly pions. By looking at the curvature of the trajectory of a particle as it bends in the magnetic field, we determine its momentum (see Seth Zenz’s post on tracking – http://blogs.uslhc.us/?p=481). Similarly, the software dealing with the calorimeter will look at the energy deposits and try to identify clusters that could be due to a single electron or to a spray of particles (referred to as a “jet”), and so on. I believe the ATLAS reconstruction software runs to more than 1 million lines of code! It is very modular, with different parts written by different physicists (graduate students, post-docs, more senior people, etc.).

However, before the reconstruction software can do its magic, a lot of other things need to be done. All the sub-detectors have to be calibrated. What this means is that we need to know how to convert, say, the size of the electronic signal left behind in the calorimeter into energy units such as MeV (million electron volts – the mass of the electron is 0.5 MeV). This work is done using data that we are collecting now (we also rely on old data from test beams, simulation (http://blogs.uslhc.us/?p=843)), and cosmic rays (http://blogs.uslhc.us/?p=1591).

Similarly, we have to know the location of the individual elements of the tracking detectors as precisely as possible. For instance, by looking at the path of an individual track we can figure out precisely where detector elements are relative to one another; this step is known as alignment. Remember, the Pixel detector (http://blogs.uslhc.us/?p=277) can measure distances of the order of 1/10th the thickness of human hair, so knowing its position is critical.

Periodically, we re-reconstruct the data to take advantage of improved in algorithms, calibration and/or alignment and also to have all of the collected data processed with the same version of the software (see Jamie’s post – http://pdg3.lbl.gov/atlasblog/?p=816).

In the next post, I will take you through the analysis stage.

 Vivek Jain is a Scientist at Indiana University, Bloomington. His current interests range from understanding various aspects of tracking to R-parity violating Supersymmetry. More information about his interests can be found at http://www.indiana.edu/~iubphys/faculty/jain2.shtml

7 or 8 TeV, a thousand terabyte question!

Event Pile-Up in the New Year?

A very happy new year to the readers of this blog. As we start 2012, hoping to finally find the elusive Higgs boson and other signatures of new physics, an important question needs to be answered first – are we going to have collisions at a center of mass energy of 7 or 8 TeV. While that may not feel like a such a drastic step up, certainly not like going to the full design collision energy of 14 TeV, it does bring its own sets of challenges for ATLAS. Understanding the detector performance is crucial for doing physics with our data, and we will have to make sure all the good work done during 7 TeV collisions can be extended if we run at 8 TeV. More collision energy means more pileup interactions; these occur when our detector can not distinguish between two separate collision events and thus considers them part of the same collision. We need to disentangle the pileup contribution to look at the real single collision event, and while a lot of work has been done in this direction, an increase in pileup is always a cause for concern. However, as someone working closely with Monte-Carlo tuning and production, I know firsthand how big of an issue this is going to be for us.

We need Monte-Carlo samples, or simulated data sets for every analysis, to calculate detector efficiency, backgrounds and what not. Also, a lot of times these Monte-Carlo event generators reflect our best understanding of certain processes, and we want to make sure they are predicting the behavior of real data closely. At times when they do not, we turn the knobs in the Monte-Carlo generators and tune them to match the data. Up until now, this tuning has been done mostly with 7 TeV collision data – although we tried to get the energy extrapolation right by using lower energy Tevatron and ATLAS data. We believe the simulation will do a good job at describing 8 TeV collision data – but we can’t be sure unless we actually compare, and most analysis groups will already want the latest and the best Monte-Carlo samples by the time the data starts coming in!

Then there is the question of size. The combined size of ATLAS 7 TeV Monte-Carlo samples is at least a few thousand terabytes. A very conservative estimate suggests we will need a few of months to produce the 8 TeV samples – the caveat is we can’t start producing them until the decision is actually made to switch to 8 TeV. This will happen immediately after the annual Chamonix meeting in the beginning of February, when the CERN management, engineers and experiment representatives meet to decide. As ramping up the energy results in higher cross-sections for the rare process we want to look at, from a physics perspective it is definitely beneficial, but we have to be ready to utilize this if and when it happens.

With input from Borut Kersevan.

 Deepak Kar is a postdoctoral research fellow with University of Glasgow. His physics interest is soft-quantum chromodynamics, and he is currently involved in underlying event analysis activities and Monte Carlo tuning in ATLAS.

Tweeting live #Higgs boson updates from #CERN

My view of CERN's auditorium, 2:15h before seminars began.

“If it’s just a fluctuation of background, it will take a lot of data to kill.”

Dr. Fabiola Gianotti, spokesperson for the ATLAS collaboration, made this statement on Dec. 13, 2011 during a special seminar I attended at CERN. Within the minute that followed, I hurriedly concocted a tweet, tacked on #Higgs and #CERN hashtags, and sent Fabiola’s weighty comment out onto the WWW.

CERN, where the WWW was invented, is the main European particle physics laboratory. I was at the lab for a week to discuss physics and the  performance of the ATLAS detector, a 7000-ton apparatus used to examine remnants of high-energy proton collisions delivered by CERN’s 27-km Large Hadron Collider (LHC), straddling the Franco-Swiss border.

This turned out to be no ordinary week. The 2011 LHC program had yielded a fecund data sample, and we needed to take stock of our most promising new-phenomena searches. By far the most anticipated were those of the Higgs boson, hypothetical pieces in the emerging puzzle of the tiniest known subatomic particles. Signal rumours had been swirling around the planet in blogs and other media. I was asked by the media relations department at my home institute, McGill University, to live tweet the Higgs update event. I already knew our ATLAS measurements, but was keen to see results by our competitors, the CMS collaboration, running a complementary detector on the opposite side of the LHC. Exciting times!

We owe much of this excitement to Ernest Rutherford who, while a McGill professor of Experimental Physics early last century, unwittingly helped to kick off the Higgs hunt through his Nobel Prize work on radioactivity. Modern theories that posit the existence of one or more types of Higgs particles seek to unify – into a more symmetric and fundamental theory – two basic forces: 19th-century electromagnetism and Rutherford’s 20th-century radioactivity. As if that weren’t enough, observing Higgs particles would also help to reveal a mechanism by which various fundamental particles are endowed with their non-zero mass values. This gets at the very essence of the physical universe.

More recently, my McGill colleagues and I have taken part in the search for Higgs bosons using the Fermilab Tevatron matter-antimatter collider near Chicago. Just last summer at McGill University, Dr. Adrian Buzatu defended a PhD thesis using Tevatron data to set the world’s best limits on Higgs boson production in the low-mass region that is now revealing hints at the LHC. Adrian recently took up a postdoctoral position in our ATLAS collaboration, working with the University of Glasgow group.

In December, I entered CERN’s auditorium three hours before the seminar was to begin. Within about 30 minutes, all available seats and aisles were jammed. A mob formed outside the auditorium door, but security guards were able to maintain control. Unable to work in all the nervous energy and jostling, I tweeted, “You’d think it was John Lennon coming to CERN today.”

At long last the ATLAS and CMS talks began. Both collaborations had searched extensively for several different signatures, scenarios by which a Higgs particle could be created from LHC collision energy before disintegrating into lighter particles.

Given our detectors’ sensitivities, and the colossal Higgs-mimicking backgrounds, we knew in advance that our samples wouldn’t suffice for a statistically robust 2011 discovery. Nevertheless, both ATLAS and CMS showed suggestive traces in a variety of Higgs signatures. Enticingly, both collaborations ruled out overlapping mass ranges and recorded hints at similar masses.

These indications are thrilling. In this kind of science, discoveries take time and are often preceded by whiffs. We’re also cautious. Our 2011 results could be chance background fluctuations, tantamount to tossing six coins and getting tails every time. Only when our observations are flukier than tossing 20 tails in a row will we claim a discovery.

The 2012 data will likely enable us to observe or rule out the Higgs boson. Either of these outcomes would constitute exhilarating, 21st-century science. I look forward to tweeting about it.