NanoAOD analysis: Produce histograms
Overview
Teaching: 5 min
Exercises: 5 minQuestions
How to produce many histograms efficiently?
How does an analysis with
RDataFrame
look like in Python?Objectives
Produce all histograms required for the final plots
Understand why we need so many histograms for a single plot
In the previous section, we produced skimmed datasets from the original files but still preserved information of selected quantities for each event. In this step, we compute histograms of these quantities for all skimmed datasets. Because of the data-driven QCD estimation, similar histograms have to be produced with the selection containing same-charged tau lepton pairs. This sums up to multiple hundreds of histograms which have to be combined into the final plots such as the ones shown in the next section.
For convenience, this step is implemented in Python in the file histograms.py
, which you can download here.
Investigate and run the Python script!
Have a look at the code and run it! Note that the program picks up the files from the same directory in which you run it.
Investigate the output!
The script produces the file
histograms.root
, which contains the histograms. You can have a look at the plain histograms using for example the ROOT browser!
Key Points
We produce histograms of all physics processes and all observables.
All histograms are produced in a signal region with opposite-signed muon-tau pairs and in a control region with same-signed pairs for the data-driven QCD estimate
This step shows the usage of RDataFrame in Python producing a large number of histograms in a single event loop and in parallel!