Basic Data Analysis Using ROOT: .X Treeviewer.c
Basic Data Analysis Using ROOT: .X Treeviewer.c
Basic Data Analysis Using ROOT: .X Treeviewer.c
Introduction
This tutorial started as a one-day class I taught in 2001. Over the years, Iʼve revised it as
different versions of ROOT came out, and in response to comments weʼve received from the
students.
In 2009, I was asked to expand the class to two full days. In past years, many students hadnʼt
gone through all the lessons that were intended to be done in a single day. So I added a set of
advanced exercises for those students who knew enough C++ to get through the original
material quickly, but allowed for the rest of the students to do in two days what earlier classes
had been asked to do in one.
For 2010, in response to student feedback, what had been a single solid day of work was split
into two half-day classes. Instead of eliminating the advanced exercises, I divided the two full
days of 2009 into four parts, each part roughly corresponding to a half-dayʼs work. This still
allows each student to set their own pace, but gives the more experienced students something
to do. In 2011 each physics group will decide how many parts their students will do.
If youʼre working with C++ for the first time, then it will probably take you at least a half-day per
part. Even someone with years of prior experience in ROOT and C++ might barely cover two
parts in half a day.
Most of the lessons have time estimates at the top. These are only rough estimates; some
students take 45 minutes to go through a lesson labeled "15 minutes", others take only 5.
Don't be too concerned about time. The important thing is for you to learn something, not to
punch a time clock. To put it another way: No one expects you to get through all 58 pages
before you start your physics work for the summer.
You can find this tutorial in Postscript and PDF format (along with links to the sample files) at
<http://www.nevis.columbia.edu/~seligman/root-class/>.
1
As of 2009, the ROOT web site link will tend to forward you to pages with the URL http://root.cern.ch/drupal/.
This shouldn’t be too much of a problem, but it’s worth noting that they’re using the Drupal content management
system.
2
I've spent two of your lifetimes already, and the class has just started!
You are going to need to have at least two windows open during this class. One window I'll call
your "ROOT command" window; this is where you'll run ROOT. The other is a separate "UNIX
command" window. On Unix, you can create a second window with the following command;
don't forget the ampersand (&):
> xterm &
You can also just run the Terminal application again, or select "Open Terminal..." from the File
menu of a running Terminal application.
I like to use File->Open Tab… instead, but you can use whichever mode you prefer.
3
Yes, that's three lifetimes so far.
Sometimes ROOT will crash. If it does, it can get into a state for which “.q” wonʼt work.
Try typing “.qqq” (three q) if “.q” doesnʼt work; if that still doesnʼt work, try five q, then
seven q. Unfortunately, if you type ten q, ROOT wonʼt respond, “Youʼre welcome.”
OK, dumb joke. But the tip about “.qqq”, “.qqqqq”, and “.qqqqqqq” is legitimate.
Sometimes I find just typing “q” or using Ctrl-C also works.
4
I’m simplifying things here. The actual rule is that everything that ROOT draws must be inside a “TPad.” Unless
you want to add graphics widgets to a window (e.g., buttons and menus), this distinction won’t matter to you.
Hints:
Look at the TF1 command on the last page. If class TF1 will generate a one-dimensional
function, what class might generate a two-dimensional function?
If TF1 takes a function name, formula, and x-limits in its constructor, what arguments
might a two-dimensional function class use? Where could you check your guess?
You probably figured out how to draw something, but you got a contour plot, not a
surface plot. Here's another hint: you want the give the option "surf1" to the Draw
method.
If you're wondering how to figure out that “surf1” was an valid option to give to Draw():
Unfortunately, this is not obvious in the current ROOT web site or documentation. If
you're clever and make a couple of guesses, you'll finally end up in the description of the
THistPainter class; this describes all the available Draw() options.
This may seem unfair; why did I ask you to chase down an obscure option? Answer: to
prepare you for the kind of detective work you often have to do to accomplish something
in ROOT; for example, the exercises in Parts Three and Four of this tutorial.
5
For advanced users:
Why would you have varying bin widths? Recall the "too many bins" and "too few bins" examples that I showed
in the introduction to the class. In physics, it's common to see event distributions with long "tails." There are
times when it's a good idea to have small-width bins in regions with large numbers of events, and large bin
widths in regions with only a few events. This can result in having roughly the same number of events per bin in
the histogram, which helps with fitting to functions as discussed in the next few pages.
$ '
# P2 &
P0 e
where P0, P1, and P2 are "parameters" of the function.6 Let's set these three parameters to values
that we choose, draw the result, and then create a new histogram from our function:
[] myfunc.SetParameters(10.,1.0,0.5)
[] myfunc.Draw()
[] TH1D h2("hist2","Histogram from my function",100,-3,3)
[] h2.FillRandom("myfunc",10000)
[] h2.Draw()
Note that we could also set the function's parameters individually:
[] myfunc.SetParameter(1,-1.0)
[] h2.FillRandom("myfunc",10000)
What's the difference between 'SetParameters' and 'SetParameter'? If you have any
doubts, check the description of class TF1 on the ROOT web site.
6
For advanced users: In ROOT's TFormula notation, this would be "[0]*exp(-0.5*((x-[1])/[2])^2)" where "[n]"
corresponds to Pn. I mention this so that when you become more experienced with defining your own
parameterized functions, you can use a different formula:
[] TF1 myGaus("user","[0]*exp(-.5*((x-[1])/[2])^2)/([2]*sqrt(2.*pi))")
This may seem cryptic to you now. It’s just a gaussian distribution with a different normalization so that P0
divided by the bin width becomes the number of events in the histogram:
[] myGaus.SetParameters(10.,0.,1.)
[] hist.Fit("user")
[] Double_t numberEquivalentEvents = myGaus.GetParameter(0) /
hist.GetBinWidth(0)
7
Hmm. There’s a file called rootlogon.C in ~seligman/root-class. I wonder what it does?
8
The folder hierarchy may be puzzling to you; your home directory will be in
/a/home/<server>/<account>. For now, don’t worry about this. If you’d like to know more, there’s a
page on automount at http://www.nevis.columbia.edu/twiki/bin/view/Nevis/Automount.
9
If you’re going through this class and you’re not logged onto a system on the Nevis Linux cluster, you’ll have to
get all the files from my web site: <http://www.nevis.columbia.edu/~seligman/root-class/files/>.
10
Advanced note: There is a way of storing comments about the contents of a ROOT tree, which can include
information such as units. However, you can't do this with n-tuples; you have to create a C++ class that contains
your information in the form of comments, and use a ROOT “dictionary” to include the additional information.
This is outside the scope of what you'll be asked to do this summer, but if you're interested in the concept, it's
described in Chapter 15 of the Root User's Guide. You’ll also have a chance to look at an example in Part Four
of this class.
11
Another advanced note: If you know what you're doing, you can use the same trick that ROOT uses when it
creates the histogram you create with commands like tree1->Draw("zv"). The trick is:
TH1* hist = new TH1D(...); // define your histogram
hist->SetBit(TH1::kCanRebin); // allow the histogram to re-bin itself
hist->Sumw2(); // so the error bars are correct after re-binning
“Re-binning” means that if a value is supplied to the histogram that's outside its limits, it will adjust those limits
automatically. It does this by summing existing bins then doubling the bin width; the bin limits change, while the
number of histogram bins remains constant.
12
That's four lifetimes. And you thought you only signed up for a ten-week project! Gosh, I wonder if it takes a
lifetime to understand high-energy physics.
void Analyze::Loop()
{
// In a ROOT session, you can do:
// Root > .L Analyze.C
// Root > Analyze t
// Root > t.GetEntry(12); // Fill t data members with entry number 12
// Root > t.Show(); // Show values of entry 12
// Root > t.Show(16); // Read and show values of entry 16
// Root > t.Loop(); // Loop on all entries
//
Long64_t nbytes = 0, nb = 0;
for (Long64_t jentry=0; jentry<nentries;jentry++) {
Long64_t ientry = LoadTree(jentry);
if (ientry < 0) break;
nb = fChain->GetEntry(jentry); nbytes += nb;
// if (Cut(ientry) < 0) continue;
// The Loop code goes here.
}
// The Wrap-up code goes here.
}
Now you can tell me the center of the drawn circle in some co-ordinate system; for
example, you could take a ruler and measure the center from the edge of the page. The
object has the hard numbers that allow the circle's commands to calculate numerical
values for the circumference, area, and so on.
To put it another way: the class represents the rules for accessing the information; the
object holds the specific information.
Assume that I write C++ code to define a circle class. I'm going to put this code in a file
whose name is CircleClass.C. I'm going to give the class a name: Circle. That class is
going to contain a command: Area.
These are the ROOT commands that might be used to find the area of c, a particular
circle:
[] .L CircleClass.C
[] Circle c
[] c.Area()
(continued on next page)
13
A tangent: Suppose you're told to fill two histograms, then add them together. If you do this, you'll want to call
the "Sumw2" method of both histograms before you fill them; e.g.,
TH1* hist1 = new TH1D(…);
TH1* hist2 = new TH1D(…);
hist1->Sumw2();
hist2->Sumw2();
// Fill your histograms, then to add hist2 to the contents of hist1:
hist1->Add(hist2);
If you forget the "Sumw2", then your error bars after the math operation won't be correct. General rule: If you're
going to perform histogram arithmetic, use "Sumw2" (which means "sum the squares of the weights"). Some
physicists use "Sumw2" all the time, just in case.
pT = px2 + py2
This is the transverse momentum of the particle, that is, the component of the particle's
momentum that's perpendicular to the z-axis.
Let's calculate our own values in an analysis macro. Start fresh by copying our
AnalyzeComments example again:
> cp AnalyzeComments.C AnalyzeVariables.C
In the Loop section, put in the following line:
Double_t pt = TMath::Sqrt(px*px + py*py);
What does this mean?
Whenever you create a new variable in C++, you must say what type of thing it is.
Actually, we've already done this in statements like
TF1 func("user","gaus(0)+gaus(3)")
This statement creates a brand-new variable named "func", with a type of "TF1". In the
Loop section of AnalyzeVariables, we're creating a new variable named "pt", and its type
is "Double_t".
For the purpose of the analyses that you're likely to do, there are only a few types of
numeric variables that you'll have to know: "Float_t"; which is used for real numbers,
"Double_t" which is used for double-precision real numbers; "Int_t", which is used for
integers; “Bool_t” for boolean (true/false) values. "Long64_t" specifies 64-bit integers,
which you probably won't need for your work. Most physicists use double precision, just
in case.
If you already know C++: the reason why we don't just use the built-in types "float",
"double", "int", and “bool” is discussed on pages 18-19 of the ROOT Users Guide.
ROOT comes with a very complete set of math functions. You can browse them all by
looking at the TMath class on the ROOT web site, or Chapter 13 in the ROOT Userʼs
Guide. For now, it's enough to know that TMath::Sqrt() computes the square root of the
expression within the parenthesis "()".
Test the macro in AnalyzeVariables to make sure it runs. You won't see any output, but we'll fix
that in the next exercise.
I did this in front of you at the start of the class. You will have to do it on your own in nine
weeks, as you prepare your final talk or paper. The point of this exercise, as you've
probably guessed, is to have you figure out how to do this using the tools and techniques
you've learned so far. Hopefully, you'll still remember how to do this at the end of the
summer.
> cd tutorials
> root –l demos.C
> cd graphics
> root –l first.C
> less first.C
Youʼre going to need these resources as you move into the following topics for Parts
Three and Four of the tutorial. Iʼm going to do less “hand holding” in these notes from
now on, because a part of these exercises is to teach you how to use these references.15
If the distributed nature of the information is annoying to you, welcome to the club! I often
have to go hunting to find the answers I want when using ROOT, even after years of
working with the package. Occasionally Iʼve had no other choice but to examine the C++
source code of the ROOT program itself to find out the answer to a question.
14
If the command doesn’t work: Did you remember to type “setup root” in your UNIX command window? That’s
what sets the value of $ROOTSYS.
15
You can still ask me questions; I mean that any remaining written hints in this tutorial will be less detailed or
require more thought.
16
For those familiar with the issues of public inheritance: yes, I’m skipping over a lot of details, such as the
distinction between virtual versus non-virtual methods.
Arrays
Do a web search on “C++ arrays” to learn about these containers. Briefly, to create a double-
precision array of eight elements, you could say:
Double_t myArray[8];
To refer to the 3rd element in the array, you might use (remember, in C++ the first element has an
index of 0):
Int_t i = 2;
myArray[i] = 0.05;
If you’re new to C++, it won’t be obvious that while myArray[2] is a Double_t object, the type
of the name myArray (without any index) is Double_t*, or a pointer to a Double_t (see page 22).
Getting confused? Let’s keep it simple. If you’ve created arrays with values and errors…
Double_t xValue[22];
Double_t xError[22];
Double_t yValue[22];
Double_t yError[22];
…and you’ve put numbers into those arrays, then you can create a TGraphErrors with:
TGraphErrors* myPlot = new TGraphErrors(22,xValue,yValue,xError,yError);
ROOT’s containers
Go to the Class Index page of the ROOT Reference Guide on the web. Near the top of the page
there’s a list of categories; click on CORE, then on CONT. You’ll see a list of ROOT’s container
classes, along with links for information about collections and why they’re used. Read the
“Understanding Collections” page, and at least skim the chapter about collections in the ROOT
Users Guide.
Iʼll be blunt here, and perhaps editorialize too much: I donʼt like ROOTʼs collection
classes. The main reason is that most of them can only hold pointers to classes that
inherit from TObject. For example, if you wanted to create a TList that held strings or
double-precision numbers (TString and Double_t in ROOT), you canʼt do it.
Go back to the TGraphErrors page. The seventh way to create a TGraphErrors object has a
TVectorD link; click on that link to read the description. Learn much? Try clicking on
TVectorT<double>.
This is ROOTʼs answer to the issue I just raised: they provide special containers for
certain types.
You need to know a little about ROOTʼs collection classes to be able to understand how
ROOT works with collections of objects; exercise 16 below is an example of this. For any
other work, Iʼm going to suggest something else:
17
I’ve lost track of the number of your lifetimes I’ve spent. You’re probably tired of the joke anyway.
Use the histograms in folder example1 file folders.root. The y-values and error bars will come
from fitting each histogram to a gaussian distribution; the y-value is the mean of the gaussian,
and the y-error is the width of the gaussian.
Youʼve spent five pages reading about abstract concepts and are probably eager to do
some work, but thereʼs still a couple of things youʼll have to figure out.
First of all, thereʼs no n-tuple in this exercise. Youʼll have to create a ROOT macro to
create the graph on your own.18 Youʼve seen some macros before (remember c1.C?),
and youʼll find many more in the ROOT tutorials.
(continued on next page)
18
You could try typing in the ROOT commands on the ROOT command line one-by-one. But unless you have a
shining grasp of ROOT concepts and perfect typing skills, you’re going to make mistakes that will involve many
quit-and-restarts of ROOT. It’s much easier to write and edit a macro.
19
Optional tangent:
“Grep” is a program that implements something called “regular expressions,” a powerful method for searching,
replacing, and processing text. More sophisticated programs that use regular expressions include sed, awk, and
perl. They are used in processing text, not numerical calculations, so the deep nitty-gritty of regular expressions
is rarely relevant in physics.
Regular expressions are a complex topic, and it can take a lifetime to learn about them. (You may be tired of the
joke, but I’m not!)
20
Another tangent:
LaTeX is a document-preparation package that’s often used in research. If you write a paper for publication this
summer, you are going to use LaTeX; physics publications don’t accept articles in MS-Office format. A real
LaTeX document is much more complex than you’ll read about in the TLatex documentation, but don’t worry
about that. No one writes a LaTeX document from scratch; they get one from someone and learn by example. It’s
much easier than learning ROOT; it’s closer to another page mark-up language, HTML, which you’ve probably
seen before.
You can spend a lifetime learning LaTeX, but no one ever has.
21
Optional digression: There are three main ways of handling strings in ROOT/C++:
- The original way in the older language C, as an array of char: char oldStyleString[256];
- A newer way, added to the C++ language: std::string newStyleString;
- The ROOT way: TString rootStyleString;
Which is better? My attitude is that none of them is best. In a ROOT program, I tend to use TString; if my
program doesn’t use ROOT, I use std::string for string variables and arrays of char for constant strings.
The blunt reality is that C++ doesn’t have the built-in text manipulation facilities of languages like perl or
python. This can be important in a physics analysis procedure; while your calculations are based on numbers,
manipulating files or program arguments can be based on strings.
22
Optional editorializing again: If you followed the steps I just described and saw the same thing I did, it’s pretty
clear what happened: the person who wrote the method intended to supply some comments later.
Here’s a tip for writing code that will make you a hero: the word “later” does not exist. Treat the comments as
part of the code-writing process. If you have to edit the code, edit the comments.
Yes, I know it’s a pain. But pounding your head on a desk is a bigger pain. It’s the biggest pain of all when you
realize that you wrote the code six months ago, have completely forgotten what it means, and must now spend an
hour figuring it out. It would have taken five seconds to write a comment.
23
If you haven’t encountered a segmentation fault yet in this tutorial, you’re either very lucky or very good at
managing your pointers. Now you know why it happens: someone tried to call a method for an object that wasn’t
there.
24
If you didn’t get such a message, then you probably copied rootlogon.C from my root-class directory in
your working directory. That’s OK, but you might want to temporarily rename this file and restart ROOT just so
you can see that error message. That way you’ll know how it looks if you have a missing-dictionary problem.
25
This library may not work if you’re on a different kind of system than the one on which I created the library. If
you get some kind of load error, here’s what to do. Copy the following additional files from my root-class
directory:
LinkDef.h
ExampleEvent.cxx
BuildExampleEvent.cxx
BuildExampleEvent.sh
Run the UNIX command script with:
> sh BuildExampleEvent.sh
This will (re-)create the libExampleEvent shared library. It will also create the program BuildExampleEvent,
which I used to create the file exampleEvent.root.
If you’re running this on a Macintosh, the name of the library will be libExampleEvent.dylib; that’s the name to
use in the gSystem->Load() command in the Mac version of ROOT.
26
Why don’t I want to you use MakeClass here? The answer is that some physics experiments only use ROOT to
make n-tuples, they don’t use it for their more complex C++ classes. In that case, you won’t be able to use
MakeClass because you won’t have a ROOT dictionary. It’s likely that such a physics experiment would have its
own I/O methods that you’d use to read its physics classes, but you’d still use a ROOT TTree and branches to
write your n-tuple.
27
Now you know the reason for my bald patches!
28
Maybe now you’re thinking, “Wow! It’s lucky I turned to the last page before I actually started doing any of the
work!” Take my word for it: reading my solution is not a substitute for working through the problem yourself.