Full Text

Turn on search term navigation

http://crossmark.crossref.org/dialog/?doi=10.1140/epjc/s10052-015-3587-2&domain=pdf

Web End = http://crossmark.crossref.org/dialog/?doi=10.1140/epjc/s10052-015-3587-2&domain=pdf

Web End = Eur. Phys. J. C (2015) 75:409

DOI 10.1140/epjc/s10052-015-3587-2

Special Article - Tools for Experiment and Theory

Report of BOOST2013, hosted by the University of Arizona, 12th16th of August 2013

D. Adams1, A. Arce2, L. Asquith3, M. Backovic4, T. Barillari5, P. Berta6, D. Bertolini7, A. Buckley8, J. Butterworth9,R. C. Camacho Toro10, J. Caudron11, Y.-T. Chien12, J. Cogan13, B. Cooper9,a, D. Curtin14, C. Debenedetti15,J. Dolen16, M. Eklund17, S. El Hedri11, S. D. Ellis18, T. Embry17, D. Ferencek19, J. Ferrando8, S. Fleischmann20,M. Freytsis21, M. Giulini22, Z. Han23, D. Hare24, P. Harris25, A. Hinzmann26, R. Hoing27, A. Hornig12,M. Jankowiak28, K. Johns17, G. Kasieczka29, R. Kogler27, W. Lampl17, A. J. Larkoski30, C. Lee12, R. Leone17,P. Loch17, D. Lopez Mateos21, H. K. Lou31, M. Low32, P. Maksimovic33, I. Marchesini27, S. Marzani30, L. Masetti11,R. McCarthy34, S. Menke5, D. W. Miller32, K. Mishra24, B. Nachman13, P. Nef13, F. T. OGrady17, A. Ovcharova35,A. Picazio10, C. Pollard8, B. Potter-Landua25, C. Potter25, S. Rappoccio16, J. Rojo36, J. Rutherfoord17,G. P. Salam25,37, R. M. Schabinger38, A. Schwartzman13, M. D. Schwartz21, B. Shuve39, P. Sinervo40, D. Soper23,D. E. Sosa Corral22, M. Spannowsky41, E. Strauss13, M. Swiatlowski13, J. Thaler30, C. Thomas25, E. Thompson42,N. V. Tran24, J. Tseng36, E. Usai27, L. Valery43, J. Veatch17, M. Vos44, W. Waalewijn45, J. Wacker46, C. Young25

1 Brookhaven National Laboratory, Upton, NY 11973, USA

2 Duke University, Durham, NC 27708, USA

3 University of Sussex, Brighton BN1 9RH, UK

4 CP3, Universite catholique du Louvain, 1348 Louvain-la-Neuve, Belgium

5 Max-Planck-Institute fuer Physik, 80805 Munich, Germany

6 Charles University in Prague, FMP, V Holesovickach 2, Prague, Czech Republic

7 University of California, Berkeley, CA 94720, USA

8 University of Glasgow, Glasgow, G12 8QQ, UK

9 University College London, WC1E 6BT, UK

10 University of Geneva, 1211 Geneva 4, Switzerland

11 Universitt Mainz, Mainz, DE 55099, Germany

12 Los Alamos National Laboratory, Los Alamos, NM 87545, USA

13 SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA

14 University of Maryland, College Park, MD 20742, USA

15 University of California, Santa Cruz, CA 95064, USA

16 University at Buffalo, Buffalo, NY 14260, USA

17 University of Arizona, Tucson, AZ 85719, USA

18 University of Washington, Seattle, WA 98195, USA

19 Rutgers University, Piscataway, NJ 08854, USA

20 Bergische Universitt Wuppertal, Wuppertal 42097, Germany

21 Harvard University, Cambridge, MA 02138, USA

22 Universitt Heidelberg, Heidelberg 69117, Germany

23 University of Oregon, Eugene, OR 97403, USA

24 Fermi National Accelerator Laboratory, Batavia, IL 60510, USA

25 CERN, 1211 Geneva 23, Switzerland

26 Universitt Zrich, 8006 Zurich, Switzerland

27 Universitt Hamburg, Hamburg 22761, Germany

28 New York University, New York, NY 10003, USA

29 ETH Zrich, 8092 Zurich, Switzerland

30 Massachusetts Institute of Technology, Cambridge, MA 02139, USA

31 Princeton University, Princeton, NJ 08544, USA

32 University of Chicago, Zurich, IL 60637, USA

33 Johns Hopkins University, Baltimore, MD 21218, USA

34 YITP, Stony Brook University, Stony Brook, NY 11794-3840, USA

35 Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA

36 University of Oxford, Oxford OX1 3NP, UK

37 LPTHE, UPMC Univ. Paris 6 and CNRS UMR, 7589 Paris, France

38 Universidad Autonoma de Madrid, 28049 Madrid, Spain

39 Perimeter Institute for Theoretical Physics, Waterloo, ON N2L 2Y5, Canada

40 University of Toronto, Toronto, ON M5S 1A7, Canada

41 IPPP, University of Durham, Durham DH1 3LE, UK

http://crossmark.crossref.org/dialog/?doi=10.1140/epjc/s10052-015-3587-2&domain=pdf

Web End = http://crossmark.crossref.org/dialog/?doi=10.1140/epjc/s10052-015-3587-2&domain=pdf

Web End = Towards an understanding of the correlations in jet substructure

123

409 Page 2 of 52 Eur. Phys. J. C (2015) 75 :409

42 Columbia University, New York, NY 10027, USA

43 LPC Clermont-Ferrand, 63177 Aubiere Cedex, France

44 Instituto de Fsica Corpuscular, IFIC/CSIC-UVEG, 46071 Valencia, Spain

45 University of Amsterdam, 1012 WX Amsterdam, The Netherlands

46 Stanford Institute for Theoretical Physics, Stanford, CA 94305, USA

Received: 13 April 2015 / Accepted: 30 July 2015 / Published online: 9 September 2015 The Author(s) 2015. This article is published with open access at Springerlink.com

Abstract Over the past decade, a large number of jet substructure observables have been proposed in the literature, and explored at the LHC experiments. Such observables attempt to utilize the internal structure of jets in order to distinguish those initiated by quarks, gluons, or by boosted heavy objects, such as top quarks and W bosons. This report, originating from and motivated by the BOOST2013 workshop, presents original particle-level studies that aim to improve our understanding of the relationships between jet substructure observables, their complementarity, and their dependence on the underlying jet properties, particularly the jet radius and jet transverse momentum. This is explored in the context of quark/gluon discrimination, boosted W boson tagging and boosted top quark tagging.

1 Introduction

The center-of-mass energies at the Large Hadron Collider are large compared to the heaviest of known particles, even after accounting for parton density functions. With the start of the second phase of operation in 2015, the center-of-mass energy will further increase from 7 TeV in 20102011 and8 TeV in 2012 to 13 TeV. Thus, even the heaviest states in the Standard Model (and potentially previously unknown particles) will often be produced at the LHC with substantial boosts, leading to a collimation of the decay products.For fully hadronic decays, these heavy particles will not be reconstructed as several jets in the detector, but rather as a single hadronic jet with distinctive internal substructure.This realization has led to a new era of sophistication in our understanding of both standard Quantum Chromodynamics (QCD) jets, as well as jets containing the decay of a heavy particle, with an array of new jet observables and detection techniques introduced and studied to distinguish the two types of jets. To allow the efcient propagation of results from these studies of jet substructure, a series of BOOST Workshops have been held on an annual basis: SLAC (2009) [1], Oxford University (2010) [2], Princeton University (2011) [3], IFIC Valencia (2012) [4], University of Arizona (2013) [5], and, most recently, University College London (2014) [6]. Fol

a e-mail: mailto:[email protected]

Web End [email protected]

lowing each of these meetings, working groups have generated reports highlighting the most interesting new results, and often including original particle-level studies. Previous BOOST reports can be found at [79].

This report from BOOST 2013 thus views the study and implementation of jet substructure techniques as a fairly mature eld, and focuses on the question of the correlations between the plethora of observables that have been developed and employed, and their dependence on the underlying jet parameters, especially the jet radius R and jet trans-verse momentum (pT ). In new analyses developed for the report, we investigate the separation of a quark signal from a gluon background (q/g tagging), a W signal from a gluon background (W-tagging) and a top signal from a mixed quark/gluon QCD background (top-tagging). In the case of top-tagging, we also investigate the performance of dedicated top-tagging algorithms, the HepTopTagger [10] and the Johns Hopkins Tagger [11]. We study the degree to which the discriminatory information provided by the observables and taggers overlaps by examining the extent to which the signal-background separation performance increases when two or more variables/taggers are combined in a multivariate analysis. Where possible, we provide a discussion of the physics behind the structure of the correlations and the pT and R scaling that we observe.

We present the performance of observables in idealized simulations without pile-up and detector resolution effects; the relationship between substructure observables, their correlations, and how these depend on the jet radius R and jet pT should not be too sensitive to such effects. Conducting studies using idealized simulations allows us to more clearly elucidate the underlying physics behind the observed performance, and also provides benchmarks for the development of techniques to mitigate pile-up and detector effects. A full study of the performance of pile-up and detector mitigation strategies is beyond the scope of the current report, and will be the focus of upcoming studies.

The report is organized as follows: in Sects. 24, we describe the methods used in carrying out our analysis, with a description of the Monte Carlo event sample generation in Sect. 2, the jet algorithms, observables and taggers investigated in our report in Sect. 3, and an overview of the multivariate techniques used to combine multiple observables into single discriminants in Sect. 4. Our results follow in

123

Eur. Phys. J. C (2015) 75 :409 Page 3 of 52 409

Sects. 57, with q/g-tagging studies in Sect. 5, W-tagging studies in Sect. 6, and top-tagging studies in Sect. 7. Finally we offer some summary of the studies and general conclusions in Sect. 8.

The principal organizers of and contributors to the analyses presented in this report are: B. Cooper, S. D. Ellis,M. Freytsis, A. Hornig, A. Larkoski, D. Lopez Mateos,B. Shuve, and N. V. Tran.

2 Monte Carlo samples

Below, we describe the Monte Carlo samples used in the q/g tagging, W-tagging, and top-tagging sections of this report.Note that no pile-up (additional protonproton interactions beyond the hard scatter) are included in any samples, and there is no attempt to emulate the degradation in angular and pT resolution that would result when reconstructing the jets inside a real detector; such effects are deferred to future study.

2.1 Quark/gluon and W-tagging

Samples were generated at s = 8 TeV for QCD dijets, and

for W+W pairs produced in the decay of a scalar resonance.

The W bosons are decayed hadronically. The QCD events were split into subsamples of gg and q q events, allowing for

tests of discrimination of hadronic W bosons, quarks, and gluons.

Individual gg and q q samples were produced at leading

order (LO) using MadGraph5 [12], while W+W sam

ples were generated using the JHU Generator [1315].

Both were generated using CTEQ6L1 PDFs [16]. The samples were produced in exclusive pT bins of width 100GeV, with the slicing parameter chosen to be the pT of any nal state parton or W at LO. At the parton level, the pT bins investigated in this report were 300400GeV, 500 600GeV and 1.01.1 TeV. The samples were then showered through Pythia8 (version 8.176) [17] using the default tune 4C [18]. For each of the various samples (W, q, g) and pT bins, 500k events were simulated.

2.2 Top-tagging

Samples were generated at s = 14 TeV. Standard Model

dijet and top pair samples were produced with Sherpa2.0.0 [1924], with matrix elements of up to two extra partons matched to the shower. The top samples included only hadronic decays and were generated in exclusive pT bins of width 100GeV, taking as slicing parameter the top quark pT . The QCD samples were generated with a lower cut on the leading parton-level jet pT , where parton-level jets are clustered with the anti-kT algorithm and jet radii of R = 0.4, 0.8, 1.2. The matching scale is selected to

be Qcut = 40, 60, 80 GeV for the pT min = 600, 1000,

and 1500 GeV bins, respectively. For the top samples, 100k events were generated in each bin, while 200k QCD events were generated in each bin.

3 Jet algorithms and substructure observables

In Sects. 3.1, 3.2, 3.3 and 3.4, we describe the various jet algorithms, groomers, taggers and other substructure variables used in these studies. Over the course of our study, we considered a larger set of observables, but for presentation purposes we included only a subset in the nal analysis, eliminating redundant observables.

We organize the algorithms into four categories: clustering algorithms, grooming algorithms, tagging algorithms, and other substructure variables that incorporate information about the shape of radiation inside the jet. We note that this labelling is somewhat ambiguous: for example, some of the grooming algorithms (such as trimming and pruning) as well as N-subjettiness can be used in a tagging capacity. This ambiguity is particularly pronounced in multivariate analyses, such as the ones we present here, since a single variable can act in different roles depending on which other variables it is combined with. Therefore, the following classication is intended only to give an approximate organization of the variables, rather than as a denitive taxonomy.

Before describing the observables used in our analysis, we give our denition of jet constituents. As a starting point, we can think of the nal state of an LHC collision event as being described by a list of nal state particles. In the analyses of the simulated events described below (with no detector simulation), these particles include the sufciently long lived protons, neutrons, photons, pions, electrons and muons with no requirements on pT or rapidity. Neutrinos are excluded from the jet analyses.

3.1 Jet clustering algorithms

Jet clustering Jets were clustered using sequential jet clustering algorithms [25] implemented in FastJet 3.0.3. Final state particles i, j are assigned a mutual distance di j and a distance to the beam, diB. The particle pair with smallest di j are recombined and the algorithm repeated until the smallest distance is from a particle i to the beam, diB, in which case i is set aside and labelled as a jet. The distance metrics are dened as

di j = min(p2T i , p2T j )

R2i j

R2 , (1)

diB = p2T i , (2)

123

409 Page 4 of 52 Eur. Phys. J. C (2015) 75 :409

where R2i j = ( i j)2 + ( i j)2, with i j being the sep

aration in pseudorapidity of particles i and j, and i j being the separation in azimuth. In this analysis, we use the anti-kT algorithm ( = 1) [26], the Cambridge/Aachen (C/A)

algorithm ( = 0) [27,28], and the kT algorithm ( = 1)

[29,30], each of which has varying sensitivity to soft radiation in the denition of the jet.

This process of jet clustering serves to identify jets as (non-overlapping) sub-lists of nal state particles within the original event-wide list. The particles on the sub-list corresponding to a specic jet are labeled the constituents of that jet, and most of the tools described here process this sub-list of jet constituents in some specic fashion to determine some property of that jet. The concept of constituents of a jet can be generalized to a more detector-centric version where the constituents are, for example, tracks and calorimeter cells, or to a perturbative QCD version where the constituents are par-tons (quarks and gluons). These different descriptions are not identical, but are closely related. We will focus on the MC based analysis of simulated events, while drawing insight from the perturbative QCD view. Note also that, when a detector (with a magnetic eld) is included in the analysis, there will generally be a minimum pT requirement on the constituents so that realistic numbers of constituents will be smaller than, but presumably still proportional to, the numbers found in the analyses described here.

Qjets We also perform non-deterministic jet clustering [31, 32]. Instead of always clustering the particle pair with smallest distance di j, the pair selected for combination is chosen probabilistically according to a measure

Pi j e (di jdmin)/dmin, (3)

where dmin is the minimum distance for the usual jet clustering algorithm at a particular step. This leads to a different cluster sequence for the jet each time the Qjet algorithm is used, and consequently different substructure properties.The parameter is called the rigidity and is used to control how sharply peaked the probability distribution is around the usual, deterministic value. The Qjets method uses statistical analysis of the resulting distributions to extract more information from the jet than can be found in the usual cluster sequence.

3.2 Jet grooming algorithms

Pruning Given a jet, re-cluster the constituents using the C/A algorithm. At each step, proceed with the merger as usual unless both

min(pT i, pT j)

pT i j < zcut and Ri j >

2m j

pT j Rcut, (4)

in which case the merger is vetoed and the softer branch discarded. The default parameters used for pruning [33] in this report are zcut = 0.1 and Rcut = 0.5, unless otherwise

stated. One advantage of pruning is that the thresholds used to veto soft, wide-angle radiation scale with the jet kinematics, and so the algorithm is expected to perform comparably over a wide range of momenta.

Trimming Given a jet, re-cluster the constituents into subjets of radius Rtrim with the kT algorithm. Discard all subjets i with

pT i < fcut pT J . (5)

The default parameters used for trimming [34] in this report are Rtrim = 0.2 and fcut = 0.03, unless otherwise

stated.

Filtering Given a jet, re-cluster the constituents into subjets of radius Rlt with the C/A algorithm. Re-dene the jet to consist of only the hardest N subjets, where N is determined by the nal state topology and is typically one more than the number of hard prongs in the resonance decay (to include the leading nal-state gluon emission) [35]. While we do not independently use ltering, it is an important step of the HEPTopTagger to be dened later.

Soft drop Given a jet, re-cluster all of the constituents using the C/A algorithm. Iteratively undo the last stage of the C/A clustering from j into subjets j1, j2. If

min(pT1, pT2)

pT1 + pT2

< zcut

R12 R

, (6)

discard the softer subjet and repeat. Otherwise, take j to be the nal soft-drop jet [36]. Soft drop has two input parameters, the angular exponent and the soft-drop scale zcut.

In these studies we use the default zcut = 0.1 setting, with

= 2.

3.3 Jet tagging algorithms

Modied mass drop tagger Given a jet, re-cluster all of the constituents using the C/A algorithm. Iteratively undo the last stage of the C/A clustering from j into subjets j1, j2 with m j1 > m j2 . If either

m j1 > m j or

min(p2T1, p2T2)

m2j

R212 < ycut, (7)

then discard the branch with the smaller transverse mass mT = [radicalBig]

m2i + p2T i, and re-dene j as the branch with the

larger transverse mass. Otherwise, the jet is tagged. If de-clustering continues until only one branch remains, the jet is considered to have failed the tagging criteria [37]. In this study we use by default = 1.0 (i.e. implement no mass drop

123

Eur. Phys. J. C (2015) 75 :409 Page 5 of 52 409

criteria) and ycut = 0.1. With respect to the singular parts of

the splitting functions, this describes the same algorithm as running soft drop with = 0.

Johns Hopkins Tagger Re-cluster the jet using the C/A algorithm. The jet is iteratively de-clustered, and at each step the softer prong is discarded if its pT is less than p pT jet. This continues until both prongs are harder than the pT threshold, both prongs are softer than the pT threshold, or if they are too close (| i j| + | i j| < R); the jet is rejected if

either of the latter conditions apply. If both are harder than the pT threshold, the same procedure is applied to each: this results in 2, 3, or 4 subjets. If there exist 3 or 4 sub-jets, then the jet is accepted: the top candidate is the sum of the subjets, and W candidate is the pair of subjets closest to the W mass [11]. The output of the tagger is the mass of the top candidate (mt), the mass of the W candidate (mW ), and h, a helicity angle dened as the angle, measured in the rest frame of the W candidate, between the top direction and one of the W decay products. The two free input parameters of the John Hopkins tagger in this study are p and R, dened above, and their values are optimized for different jet kinematics and parameters in Sect. 7.

HEPTopTagger Re-cluster the jet using the C/A algorithm.

The jet is iteratively de-clustered, and at each step the softer prong is discarded if m1/m12 > (there is not a signicant mass drop). Otherwise, both prongs are kept. This continues until a prong has a mass mi < m, at which point it is added to the list of subjets. Filter the jet using Rlt = min(0.3, Ri j),

keeping the ve hardest subjets (where Ri j is the distance between the two hardest subjets). Select the three subjets whose invariant mass is closest to mt [10]. The top candidate is rejected if there are fewer than three subjets or if the top candidate mass exceeds 500 GeV. The output of the tagger is mt, mW , and h (as dened in the Johns Hopkins Tagger). The two free input parameters of the HEPTopTagger in this study are m and , dened above, and their values are optimized for different jet kinematics and parameters in Sect. 7.

Top-tagging with pruning or trimming In the studies presented in Sect. 7 we add a W reconstruction step to the pruning and trimming algorithms, to enable a fairer comparison with the dedicated top tagging algorithms described above. Following the method of the BOOST 2011 report [8], a W candidate is found as follows: if there are two subjets, the highest-mass subjet is the W candidate (because the W prongs end up clustered in the same subjet), and the W candidate mass, mW , the mass of this subjet; if there are three subjets, the two subjets with the smallest invariant mass comprise the W candidate, and mW is the invariant mass of this subjet pair. In the case of only one subjet, the top candidate is rejected. The top mass, mt, is the full mass of the groomed jet.

3.4 Other jet substructure observables

The jet substructure observables dened in this section are calculated using jet constituents prior to any grooming. This approach has been used in several analyses in the past, for example [38,39], whilst others have used the approach of only considering the jet constituents that survive the grooming procedure [40]. We take the rst approach throughout our analyses, as this approach allows a study of both the hard and soft radiation characteristic of signal vs. background. However, we do include the effects of initial state radiation and the underlying event, and unsurprisingly these can have a non-negligible effect on variable performance, particularly at large pT and jet R. This suggests that the differences we see between variable performance at large pT /R will be accentuated in a high pile-up environment, necessitating a dedicated study of pile-up to recover as much as possible the ideal performance seen here. Such a study is beyond the scope of this paper.

Qjet mass volatility As described above, Qjet algorithms re-cluster the same jet non-deterministically to obtain a collection of interpretations of the jet. For each jet interpretation, the pruned jet mass is computed with the default pruning parameters. The mass volatility, Qjet, is dened as [31]

Qjet = [radicalBig]

m2J mJ 2

, (8)

where averages are computed over the Qjet interpretations. We use a rigidity parameter of = 0.1 (although other stud

ies suggest a smaller value of may be optimal [31,32]), and 25 trees per event for all of the studies presented here.

N-subjettiness N-subjettiness [41] quanties how well the radiation in the jet is aligned along N directions. To compute

N-subjettiness, ()N, one must rst identify N axes within the jet. Then,

N =

1 d0

ipT i min( R1i, . . . , RNi), (9)

where distances are between particles i in the jet and the axes,

d0 = [summationdisplay]

pT i R (10)

and R is the jet clustering radius. The exponent is a free parameter. There is also some choice in how the axes used to compute N-subjettiness are determined. The optimal conguration of axes is the one that minimizes N-subjettiness; recently, it was shown that the winner-take-all (WTA) axes can be easily computed and have superior performance compared to other minimization techniques [42]. We use both the

123

409 Page 6 of 52 Eur. Phys. J. C (2015) 75 :409

WTA (Sect. 7) and one-pass kT optimization axes (Sects. 5, 6) in our studies.

Often, a powerful discriminant is the ratio,

N,N1

. (11)

While this is not an infrared-collinear (IRC) safe observable, it is calculable [43] and can be made IRC safe with a loose lower cut on N1.

Energy correlation functions The transverse momentum version of the energy correlation functions are dened as [44]:

ECF(N, ) = [summationdisplay]

i1<i2<...<iN j

[parenleftBigg]

a=1 pT ia

[parenrightBigg]

[parenleftBigg]N1

b=1

c=b+1 Ribic

, (12)

where i is a particle inside the jet. It is preferable to work in terms of dimensionless quantities, particularly the energy correlation function double ratio:

CN =

ECF(N + 1, ) ECF(N 1, )

ECF(N, )2 . (13)

This observable measures higher-order radiation from leading-order substructure. Note that C=02 is identical to the variable pT D introduced by CMS in [45].

4 Multivariate analysis techniques

Multivariate techniques are used to combine multiple variables into a single discriminant in an optimal manner. The extent to which the discrimination power increases in a multivariable combination indicates to what extent the discriminatory information in the variables overlaps. There exist alternative strategies for studying correlations in discrimination power, such as truth matching [46], but these are not explored here.

In all cases, the multivariate technique used to combine variables is a Boosted Decision Tree (BDT) as implemented in the TMVA package [47]. An example of the BDT settings used in these studies, chosen to reduce the effect of over-training, is given in [47]. The BDT implementation including gradient boost is used. Additionally, the simulated data were split into training and testing samples and comparisons of the BDT output were compared to ensure that the BDT performance was not affected by overtraining.

5 Quarkgluon discrimination

In this section, we examine the differences between quark-and gluon-initiated jets in terms of substructure variables. At a fundamental level, the primary difference between quark-and gluon-initiated jets is the color charge of the initiating parton, typically expressed in terms of the ratio of the corresponding Casimir factors CF/CA = 4/9. Since the quark has

the smaller color charge, it radiates less than a corresponding gluon and the naive expectation is that the resulting quark jet will contain fewer constituents than the corresponding gluon jet. The differing color structure of the two types of jet will also be realized in the detailed behavior of their radiation patterns. We determine the extent to which the substructure observables capturing these differences are correlated, providing some theoretical understanding of these variables and their performance. The motivation for these studies arises not only from the desire to tag a jet as originating from a quark or gluon, but also to improve our understanding of the quark and gluon components of the QCD backgrounds relative to boosted resonances. While recent studies have suggested that quark/gluon tagging efciencies depend highly on the Monte Carlo generator used [48,49], we are more interested in understanding the scaling performance with pT and R, and the correlations between observables, which are expected to be treated consistently within a single shower scheme.

Other examples of recent analytic studies of the correlations between jet observables relevant to quark jet versus gluon jet discrimination can be found in [43,46,50,51].

5.1 Methodology and observable classes

These studies use the qq and gg MC samples described in Sect. 2. The showered events were clustered with Fast-Jet 3.03 using the anti-kT algorithm with jet radii of R =

0.4, 0.8, 1.2. In both signal (quark) and background (gluon) samples, an upper and lower cut on the leading jet pT is applied after showering/clustering, to ensure similar pT spectra for signal and background in each pT bin. The bins in leading jet pT that are considered are 300400 GeV, 500600 GeV, 1.01.1 TeV, for the 300400 GeV, 500600 GeV, 1.01.1 TeV parton pT slices respectively. Various jet grooming approaches are applied to the jets, as described in Sect. 3.4. Only leading and subleading jets in each sample are used. The following observables are studied in this section:

Number of constituents (nconstits) in the jet.

Pruned Qjet mass volatility, Qjet.

1-point energy correlation functions, C1 with =

0, 1, 2.

123

Eur. Phys. J. C (2015) 75 :409 Page 7 of 52 409

1-subjettiness, 1 with = 1, 2. The N-subjettiness

axes are computed using one-pass kt axis optimization.

Ungroomed jet mass, m.

For simplicity, we hereafter refer to quark-initiated jets (gluon-initiated jets) as quark jets (gluon jets).

We will demonstrate that, in terms of their jet-by-jet correlations and their ability to separate quark jets from gluon jets, the above observables fall into ve Classes. The rst three observables, nconstits, Qjet and C=01, each constitutes a Class of its own (Classes IIII) in the sense that they each carry some independent information about a jet and, when combined, provide substantially better quark jet and gluon jet separation than any one observable alone. Of the remaining observables, C=11 and =11 comprise a single class (Class

IV) because their distributions are similar for a sample of jets, their jet-by-jet values are highly correlated, and they exhibit very similar power to separate quark jets and gluon jets (with very similar dependence on the jet parameters R and pT ); this separation power is not improved when they are combined.The fth class (Class V) is composed of C=21, =21 and the (ungroomed) jet mass. Again the jet-by-jet correlations are strong (even though the individual observable distributions are somewhat different), the quark versus gluon separation power is very similar (including the R and pT dependence), and little is achieved by combining more than one of the Class V observables. This class structure is not surprising given that the observables within a class exhibit very similar dependence on the kinematics of the underlying jet constituents.For example, the members of Class V are constructed from of a sum over pairs of constituents using products of the energy of each member of the pair times the angular separation squared for the pair (this is apparent for the ungroomed mass when viewed in terms of a mass-squared with small angular separations). By the same argument, the Class IV and Class V observables will be seen to be more similar than any other pair of classes, differing only in the power () of the dependence on the angular separations, which produces small but detectable differences. We will return to a more complete discussion of jet masses in Sect. 5.4.

5.2 Single variable discrimination

In Fig. 1 are shown the quark and gluon distributions of different substructure observables in the pT = 500600 GeV

bin for R = 0.8 jets. These distributions illustrate some of

the distinctions between the Classes made above. The fundamental difference between quarks and gluons, namely their color charge and consequent amount of radiation in the jet, is clearly indicated in Fig. 1a, suggesting that simply counting constituents provides good separation between quark and gluon jets. In fact, among the observables considered, one

can see by eye that nconstits should provide the highest separation power, i.e., the quark and gluon distributions are most distinct, as was originally noted in [49,52]. Figure 1 further suggests that C=01 should provide the next best separation, followed by C=11, as was also found by the CMS and ATLAS

Collaborations [48,53].

To more quantitatively study the power of each observable as a discriminator for quark/gluon tagging, Receiver Operating Characteristic (ROC) curves are built by scanning each distribution and plotting the background efciency (to select gluon jets) vs. the signal efciency (to select quark jets). Figure 2 shows these ROC curves for all of the substructure variables shown in Fig. 1 for R = 0.4, 0.8 and 1.2 jets (in

the pT = 300400 GeV bin). In addition, the ROC curve for

a tagger built from a BDT combination of all the variables (see Sect. 4) is shown.

As suggested earlier, nconstits is the best performing variable for all R values, although C=01 is not far behind, particularly for R = 0.8. Most other variables have similar per

formance, with the main exception of Qjet, which shows signicantly worse discrimination (this may be due to our choice of rigidity = 0.1, with other studies suggesting that

a smaller value, such as = 0.01, produces better results

[31,32]). The combination of all variables shows somewhat better discrimination than any individual observable, and we give a more detailed discussion in Sect. 5.3 of the correlations between the observables and their impact on the combined discrimination power.

We now examine how the performance of the substructure observables varies with pT and R. To present the results in a digestible fashion we focus on the gluon jet rejection factor, 1/bkg, for a quark signal efciency, sig, of 50 %. We can use the values of 1/bkg generated for the 9 kinematic points introduced above (R = 0.4, 0.8, 1.2 and the 100 GeV

pT bins with lower limits pT = 300, 500, 1000 GeV) to

generate surface plots. The surface plots in Fig. 3 indicate both the level of gluon rejection and the variation with pT and R for each of the studied single observable. The color shading in these plots is dened so that a value of 1/bkg 1

yields the color violet, while 1/bkg 20 yields the color

red. The rainbow of colors in between vary linearly with log10(1/bkg).

We organize our results by the classes introduced in the previous subsection:

Class I The sole constituent of this class is nconstits. We see in Fig. 3a that, as expected, the numerically largest rejection rates occur for this observable, with the rejection factor ranging from 6 to 11 and varying rather dramatically with R. As R increases the jet collects more constituents from the underlying event, which are the same for quark and gluon jets, and the separation power decreases. At large R, there is

123

409 Page 8 of 52 Eur. Phys. J. C (2015) 75 :409

qg, pT = 500-600 GeV, AK8

BOOST13WG

qg, pT = 500-600 GeV, AK8

BOOST13WG

qg, pT = 500-600 GeV, AK8

BOOST13WG

fraction of events

gluon jet quark jet

0.1

gluon jet quark jet

0.1

0.2

0.15

0.05

0.1

0.05

0 0 50 100 150

0 0 0.2 0.4

0.6 0.8 1

0 0.2 0.3 0.4 0.5

constits

Qjet

(a)

(b)

(c)

qg, pT = 500-600 GeV, AK8

BOOST13WG

qg, pT = 500-600 GeV, AK8

BOOST13WG

0.2

fraction of events

gluon jet quark jet

0.2

0.15

0.1

0.05

0 0 0.1 0.2 0.3

0 0 0.1 0.2 0.3 0.4 0.5

(d)

(e)

qg, pT = 500-600 GeV, AK8

BOOST13WG

qg, pT = 500-600 GeV, AK8

BOOST13WG

qg, pT = 500-600 GeV, AK8

BOOST13WG

0.4

fraction of events

gluon jet quark jet

0.4

gluon jet quark jet

0.6

gluon jet quark jet

0.3

0.4

0.2

0.1

0 0 0.1 0.2 0.3

0 0 0.1 0.2

0.3 0.4 0.5

0 0 50 100 150 200

m (GeV)

(f)

)

(h)

Fig. 1 Comparisons of quark and gluon distributions of different substructure variables, organized by Class, for leading jets in the pT =

500600 GeV bin using the anti-kT R = 0.8 algorithm. The rst three

plots are Classes IIII, with Class IV in the second row, and Class V in the third row

some improvement with increasing pT due to the enhanced QCD radiation, which is different for quarks vs. gluons.

Class II The variable Qjet constitutes this class. Figure 3b conrms the limited efcacy of this single observable (at

least for our parameter choices) with a rejection rate only in the range 2.52.8. On the other hand, this observable probes a very different property of jet substructure, i.e., the sensitivity to detailed changes in the grooming procedure, and this difference is suggested by the distinct R and pT

123

Eur. Phys. J. C (2015) 75 :409 Page 9 of 52 409

qg, pT = 300-400 GeV, AK4 BOOST13WG

1 m

qg, pT = 300-400 GeV, AK8 BOOST13WG

bkg

constits

-1

-2

-1

-2

allvars

Qjet

allvars

Qjet

-3

10 0 0.2 0.4 0.6 0.8 1

sig

qg, pT = 300-400 GeV, AK12 BOOST13WG

bkg

1 m

constits

-1

-2

allvars

Qjet

-3

10 0 0.2 0.4 0.6 0.8 1

sig

Fig. 2 The ROC curve for all single variables considered for quarkgluon discrimination in the pT 300400GeV bin using the anti-kT R = 0.4

(top-left), 0.8 (top-right) and 1.2 (bottom) algorithm

dependence illustrated in Fig. 3b. The rejection rate increases with increasing R and decreasing pT , since the distinction between quark and gluon jets for this observable arises from the relative importance of the one hard gluon emission conguration. The role of this contribution is enhanced for both decreasing pT and increasing R. This general variation with pT and R is the opposite of what is exhibited in all of the other single variable plots in Fig. 3.

Class III The only member of this class is C=01. Figure 3c indicates that this observable can itself provide a rejection rate in the range 7.88.6 (intermediate between the two previous observables), and again with distinct R and pT dependence. In this case the rejection rate decreases slowly with increasing R, which follows from the fact that = 0 implies

no weighting of R in the denition of C=01, greatly reducing the angular dependence. The rejection rate peaks at intermediate pT values, an effect visually enhanced by the limited number of pT values included.

Class IV Figure 3d, e conrm the very similar properties of the observables C=11 and =11 (as already suggested in Fig. 1d, e). They have essentially identical rejection

rates (4.15.4) and identical R and pT dependence (a slow decrease with increasing R and an even slower increase with increasing pT ).

Class V The observables C=21, =21, and m have similar rejection rates in the range 3.5 to 5.3, as well as very similar

R and pT dependence (a slow decrease with increasing R and an even slower increase with increasing pT ).

Arguably, drawing a distinction between the Class IV and Class V observables is a ne point, but the color shading does suggest some distinction from the slightly smaller rejection rate in Class V. Again the strong similarities between the plots within the second and third rows in Fig. 3 speaks to the common properties of the observables within the two classes.

In summary, the overall discriminating power between quark and gluon jets tends to decrease with increasing R, except for the Qjet observable, presumably in large part due to the contamination from the underlying event. Since the construction of the Qjet observable explicitly involves pruning away the soft, large angle constituents, it is not surprising that it exhibits different R dependence. In general the discriminating power increases slowly and monotonically with

123

409 Page 10 of 52 Eur. Phys. J. C (2015) 75 :409

Fig. 3 Surface plots of 1/bkg for all single variables considered for quarkgluon discrimination as functions of R and pT . The rst three plots are Classes IIII, with Class IV in the second row, and Class V in the third row

pT (except for the Qjet and C=01 observables). This is presumably due to the overall increase in radiation from high pT objects, which accentuates the differences in the quark and gluon color charges and providing some increase in discrimination. In the following section, we study the effect of combining multiple observables.

5.3 Combined performance and correlations

Combining multiple observables in a BDT can give further improvement over cuts on a single variable. Since the improvement from combining correlated observables is expected to be inferior to that from combining uncorrelated observables, studying the performance of multivariable combinations gives insight into the correlations between sub-

structure variables and the physical features allowing for quark/gluon discrimination. Based on our discussion of the correlated properties of observables within a single class, we expect little improvement in the rejection rate when combining observables from the same class, and substantial improvement when combining observables from different classes. Our classication of observables for quark/gluon tagging therefore motivates the study of particular combinations of variables for use in experimental analyses.

To quantitatively study the improvement obtained from multivariate analyses, we build quark/gluon taggers from every pair-wise combination of variables studied in the previous section; we also compare the pair-wise performance with the all-variables combination. To illustrate the results achieved in this way, we use the same 2D surface plots as

123

Eur. Phys. J. C (2015) 75 :409 Page 11 of 52 409

Fig. 4 Surface plots of 1/bkg for the indicated pairs of variables from a Class IV and b Class V considered for quarkgluon discrimination as functions of R and pT

in Fig. 3. Figure 4 shows pair-wise plots for variables in(a) Class IV and (b) Class V, respectively. Comparing to the corresponding plots in Fig. 3, we see that combining

C=11 + =11 provides a small (10%) improvement in the

rejection rate with essentially no change in the R and pT dependence, while combining C=21+=21 yields a rejection

rate that is essentially identical to the single observable rejection rate for all R and pT values (with a similar conclusion if one of these observables is replaced with the ungroomed jet mass m). This conrms the expectation that the observables within a single class effectively probe the same jet properties.

Next, we consider cross-class pairs of observables in Fig. 5, where, except in the one case noted below, we use only a single observable from each class for illustrative purposes. Since nconstits is the best performing single variable, the largest rejection rates are obtained from combining another observable with nconstits (Fig. 5ae). In general, the rejection rates are larger for the pair-wise case than for the single variable case. In particular, the pair nconstits + C=11 in

Fig. 5b yields rejection rates in the range 6.414.7 with the largest values at small R and large pT . As expected, the pair nconstits + =11 in Fig. 5e yields very similar rejection rates(6.415.0), since C=11 and =11 are both in Class IV. The other pairings with nconstits yield smaller rejection rates and smaller dynamic ranges. The pair nconstits + C=01 (Fig. 5d)

exhibits the smallest range of rates (8.311.3), suggesting that the differences between these two observables serve to substantially reduce the R and pT dependence for the pair.

The other pairs shown exhibit similar behavior.

The R and pT dependence of the pair-wise combinations is generally similar to the single observable with the most dependence on R and pT . The smallest R and pT variation always occurs when pairing with C=01. Changing any of the observables in these pairs with a different observable in the same class (e.g., C=21 for =21) produces very similar results.

Figure 5l shows the performance of a BDT combination of all the current observables, with rejection rates in the range10.517.1. The performance is very similar to that observed for the pair-wise nconstits+C=11 and nconstits+=11 combina

tions, but with a somewhat narrower range and slightly larger

maximum values. This suggests that almost all of the available information to discriminate quark and gluon-initiated jets is captured by nconstits and C=11 or =11 variables; this conrms the nding that near-optimal performance can be obtained with a pair of variables from [52].

Some features are more easily seen with an alternative presentation of the data. In Figs. 6 and 7 we x R and pT and simultaneously show the single- and pair-wise observables performance in a single matrix. The numbers in each cell are the same rejection rate for gluons used earlier, 1/bkg, with sig = 50 % (quarks). Figure 6 shows the results for

pT = 11.1 TeV and R = 0.4, 0.8, 1.2, while Fig. 7 is for

R = 0.4 and the 3 pT bins. The single observable rejection

rates appear on the diagonal, and the pairwise results are off the diagonal. The largest pair-wise rejection rate, as already suggested by Fig. 5e, appears at large pT and small R for the pair nconstits + =11 (with very similar results for nconstits +

C=11). The correlations indicated by the shading1 should be largely understood as indicating the organization of the observables into the now-familiar classes. The all-observable (BDT) result appears as the number at the lower right in each plot.

5.4 QCD jet masses

To close the discussion of q/g-tagging, we provide some insight into the behavior of the masses of QCD jets initiated by both kinds of partons, with and without grooming. Recall that, in practice, an identied jet is simply a list of constituents, i.e., nal state particles. To the extent that the masses of these individual constituents can be neglected (due to the constituents being relativistic), each constituent has a well- dened 4-momentum from its energy and direction. It follows that the 4-momentum of the jet is simply the sum of the 4-momenta of the constituents and its square is the jet mass squared. Simply on dimensional grounds, we know that jet mass must have an overall linear scaling with pT , with the remaining pT dependence arising predominantly from the running of the coupling, s(pT ). The R dependence is also

1 The connection between the value of the rejection rate and the shading color in Figs. 6 and 7 is the same as that in Figs. 3, 4 and 5.

123

409 Page 12 of 52 Eur. Phys. J. C (2015) 75 :409

Fig. 5 Surface plots of 1/bkg for the indicated pairs of variables from different classes considered for quarkgluon discrimination as functions of R and pT

123

Eur. Phys. J. C (2015) 75 :409 Page 13 of 52 409

qg, pT = 1000-1100 GeV, AK4 BOOST13WG

qg, pT = 1000-1100 GeV, AK8 BOOST13WG

Qjet

bkg. rejection (1/

constits

bkg. rejection (1/

bkg

)

fixed

= 0.50

sig

all variables, 1/

= 17.1

bkg

fixed

= 0.50

sig

all variables, 1/

= 14.7

bkg

m =1

=1 C constits

=2 n Qjet

m =1

=2 n Qjet

=1 C constits

qg, pT = 1000-1100 GeV, AK12 BOOST13WG

Qjet

constits

bkg. rejection (1/

bkg

)

fixed

= 0.50

sig

all variables, 1/

= 12.4

bkg

m =1

=2 n Qjet

=1 C constits

Fig. 6 Gluon rejection dened as 1/gluon when using each 2-variable combination as a tagger with 50% acceptance for quark jets. Results are shown for jets with pT = 11.1 TeV and for (top left) R = 0.4;

(top right) R = 0.8; (bottom) R = 1.2. The rejection obtained with a

tagger that uses all variables is also shown in the plots

crudely linear as the jet mass scales approximately with the largest angular opening between any 2 constituents, which is set by R.

To demonstrate this universal behavior for jet mass, we rst note that if we consider the mass distributions for many kinematic points (various values of R and pT ), we observe considerable variation in behaviour. This variation, however, can largely be removed by plotting versus the scaled variable m/pT /R. The mass distributions for quark and gluon jets versus m/pT /R for all of our kinematic points are shown in Fig. 8, where we use a logarithmic scale on the y-axis to clearly exhibit the behavior of these distributions over a large dynamic range. We observe that the distributions for the different kinematic points do approximately scale as expected,i.e., the simple arguments above capture most of the variation with R and pT . We will consider shortly an explanation of the residual non-scaling. A more rigorous quantitative understanding of jet mass distributions requires all-orders calcula-

tions in QCD, which have been performed for groomed and ungroomed jet mass spectra at high logarithmic accuracy, both in the context of direct QCD resummation [37,5456] and Soft Collinear Effective Theory [5759].

Several features of Fig. 8 can be easily understood. The distributions all cut off rapidly for m/pT /R > 0.5, which is understood as the precise limit (maximum mass) for a jet composed of just two constituents. As expected from the soft and collinear singularities in QCD, the mass distribution peaks at small mass values. The actual peak is pushed away from the origin by the so-called Sudakov form factor. Summing the corresponding logarithmic structure (singular in both pT and angle) to all orders in perturbation theory yields a distribution that is highly damped as the mass vanishes. In words, there is precisely zero probability that a color parton emits no radiation (and the resulting jet has zero mass). Above the Sudakov-suppressed part of phase space, there are two structures in the distribution: the shoulder and the

123

409 Page 14 of 52 Eur. Phys. J. C (2015) 75 :409

qg, pT = 300-400 GeV, AK4 BOOST13WG

qg, pT = 500-600 GeV, AK4 BOOST13WG

Qjet

bkg. rejection (1/

constits

bkg. rejection (1/

bkg

)

fixed

= 0.50

sig

all variables, 1/

= 14.3

bkg

fixed

= 0.50

sig

all variables, 1/

= 15.8

bkg

m =1

=1 C constits

=2 n Qjet

m =1

=1 C constits

=2 n Qjet

qg, pT = 1000-1100 GeV, AK4 BOOST13WG

Qjet

bkg. rejection (1/

constits

bkg

)

fixed

= 0.50

sig

all variables, 1/

= 17.1

bkg

m =1

=1 C constits

=2 n Qjet

Fig. 7 Gluon rejection dened as 1/gluon when using each 2-variable combination as a tagger with 50% acceptance for quark jets. Results are shown for R = 0.4 jets with (top left) pT = 300400 GeV, (top

right) pT = 500600 GeV and (bottom) pT = 11.1TeV. The rejec

tion obtained with a tagger that uses all variables is also shown in the plots

BOOST13WG

fraction of events

= 300-400, R = 0.8

= 300-400, R = 0.4

= 300-400, R = 1.2

= 500-600, R = 0.4

= 500-600, R = 0.8

= 500-600, R = 1.2

= 1000-1100, R = 0.4

= 1000-1100, R = 0.8

= 1000-1100, R = 1.2

= 300-400, R = 0.8

= 300-400, R = 0.4

= 300-400, R = 1.2

= 500-600, R = 0.4

= 500-600, R = 0.8

= 500-600, R = 1.2

= 1000-1100, R = 0.4

= 1000-1100, R = 0.8

= 1000-1100, R = 1.2

-1

-2

-3

10 0 0.2 0.4 0.6 0.8

/pT/R

(a) (b)

Fig. 8 Comparisons of quark and gluon ungroomed mass distributions versus the scaled variable m/pT /R

123

Eur. Phys. J. C (2015) 75 :409 Page 15 of 52 409

BOOST13WG

fraction of events

= 300-400, R = 0.8

= 300-400, R = 0.4

= 300-400, R = 1.2

= 500-600, R = 0.4

= 500-600, R = 0.8

= 500-600, R = 1.2

= 1000-1100, R = 0.4

= 1000-1100, R = 0.8

= 1000-1100, R = 1.2

= 300-400, R = 0.8

= 300-400, R = 0.4

= 300-400, R = 1.2

= 500-600, R = 0.4

= 500-600, R = 0.8

= 500-600, R = 1.2

= 1000-1100, R = 0.4

= 1000-1100, R = 0.8

= 1000-1100, R = 1.2

-1

-2

-3

10 0 0.2 0.4 0.6 0.8

/pT/R

(a) (b)

Fig. 9 Comparisons of quark and gluon pruned mass distributions versus the scaled variable mpr/pT /R

peak. The large mass shoulder (0.3 < m/pT /R < 0.5) is driven largely by the presence of a single large angle, energetic emission in the underlying QCD shower, i.e., this regime is quite well described by low-order perturbation theory2 In contrast, we can think of the peak region as corresponding to multiple soft emissions. This simple, necessarily approximate picture provides an understanding of the bulk of the differences between the quark and gluon jet mass distributions. Since the probability of the single large angle, energetic emission is proportional to the color charge, the gluon distribution should be enhanced in this region by a factor of about CA/CF = 9/4, consistent with what is observed

in Fig. 8. Similarly the exponent in the Sudakov damping factor for the gluon jet mass distribution is enhanced by the same factor, leading to a peak pushed further from the origin. Therefore, compared to a quark jet, the gluon jet mass distribution exhibits a larger average jet mass, with a larger relative contribution arising from the perturbative shoulder region and a small mass peak that is further from the origin.

Together with the fact that the number of constituents in the jet is also larger (on average) for the gluon jet simply because a gluon will radiate more than a quark, these features explain much of what we observed earlier in terms of the effectiveness of the various observables to separate quark jets from gluons jets. They also give us insight into the difference in the distributions for the observable Qjet. Since the shoulder is dominated by a single large angle, hard emission, it is minimally impacted by pruning, which is designed to remove the large angle, soft constituents (as shown in more detail below). Thus, jets in the shoulder exhibit small volatility and they are a larger component in the gluon jet distribution.

2 The shoulder label will become more clear when examining groomed jet mass distributions.

Hence gluon jets, on average, have smaller values of Qjet than quark jets as in Fig. 1b. Further, this feature of gluon jets is distinct from the fact that there are more constituents, explaining why Qjet and nconstits supply largely independent information for distinguishing quark and gluon jets.

To illustrate some of these points in more detail, Fig. 9 exhibits the same jet mass distributions after pruning [33,60]. Removing the large angle, soft constituents moves the peak in both of the distributions from m/pT /R 0.1 0.2 to the

region around m/pT /R 0.05. This explains why pruning

works to reduce the QCD background when looking for a signal in a specic jet mass bin. The shoulder feature at higher mass is much more apparent after pruning, as is the larger shoulder for the gluon jets. A quantitative (all-orders) understanding of groomed mass distributions is also possible. For instance, resummation of the pruned mass distribution was achieved in [37,56]. Figure 9 serves to conrm the physical understanding of the relative behavior of Qjet for quark and gluon jets.

Our nal topic in this section is the residual R and pT dependence exhibited in Figs. 8 and 9, which indicates a deviation from the naive linear scaling that has been removed by using the scaled variable m/pT /R. A helpful, intuitively simple, if admittedly imprecise, model of a jet is to separate the constituents of the jet into hard (with pT s that are of order the jet pT ) versus soft (with pT s small and xed compared to the jet pT ), and large angle (with an angular separation from the jet direction of order R) versus small angle (with an angular separation from the jet direction smaller than and not scaling with R) components. As described above the Sudakov damping factor excludes constituents that are very soft or very small angle (or both). In this simple picture perturbative large angle, hard constituents appear rarely, but, as described above, they characterize the large mass jets that appear in the shoulder

123

409 Page 16 of 52 Eur. Phys. J. C (2015) 75 :409

of the jet mass distribution where the mass scales approximately linearly with the jet pT and with R. The hard, small angle constituents are somewhat more numerous and contribute to a jet mass that does not scale with R. The soft constituents are much more numerous (becoming more numerous with increasing jet pT ) and contribute to a jet mass that scales like pT,jet. The small angle, soft constituents contribute to a jet mass that does not scale with R, while the large angle, soft constituents do contribute to a jet mass that scales like R and grow in number approximately linearly in R (i.e., with the area of the annulus at the outer edge of the jet). This simple picture allows at least a qualitative explanation of the behavior observed in Figs. 8 and 9.

As already suggested, the residual pT dependence can be understood as arising primarily from the slow decrease of the strong coupling s(pT ) as pT increases. This leads to a corresponding decrease in the (largely perturbative) shoulder regime for both distributions at higher pT , i.e., a decrease in the number of hard, large angle constituents. At the same time, and for the same reason, the Sudakov damping is less strong with increasing pT and the peak moves in towards the origin. While the number of soft constituents increases with increasing jet pT , their contributions to the scaled jet mass distribution shift to smaller values of m/pT (decreasing approximately like 1/pT ). Thus the overall impact of increasing pT for both distributions is a (gradual) shift to smaller values of m/pT /R. This is just what is observed in Figs. 8 and 9, although the numerical size of the effect is reduced in the pruned case.

The residual R dependence is somewhat more complicated. The perturbative large angle, hard constituent contribution largely scales in the variable m/pT /R, which is why we see little residual R dependence in either gure at higher masses (m/pT /R > 0.4). The contribution of the small angle constituents (hard and soft) contribute at xed m and thus shift to the left versus the scaled variable as R increases. This presumably explains the small shifts in this direction at small mass observed in both gures. The large angle, soft constituents contribute to mass values that scale like R, and, as noted above, tend to increase in number as R increases (i.e., as the area of the jet grows). Such contributions yield a scaled jet mass distribution that shifts to the right with increasing R and presumably explain the behavior at small pT in Fig. 8. Since pruning largely removes this contribution, we observe no such behavior in Fig. 9.

5.5 Conclusions

In Sect. 5 we have seen that a variety of jet observables provide information about the jet that can be employed to effectively separate quark-initiated from gluon-initiated jets.Further, when used in combination, these observables can

provide superior separation. Since the improvement depends on the correlation between observables, we use the multivariable performance to separate the observables into different classes, with each class containing highly correlated observables. We saw that the best performing single observable is simply the number of constituents in the jet, nconstits, while the largest further improvement comes from combining with C=11 (or =11). The performance of this combined tagger is strongly dependent on pT and R, with the best performance being observed for smaller R and higher pT . The smallest R and pT dependence arises from combining nconstits with C=01. Some of the commonly used observables for q/g tagging are highly correlated and do not provide extra information when used together. We have found that adding further variables to the nconstits + C=11 or nconstits + =11 BDT combination results in only a small improvement in performance, suggesting that almost all of the available information to discriminate quark and gluon-initiated jets is captured by nconstits and C=11 (or =11) variables. In addition to demonstrating these correlations, we have provided a discussion of the physics behind the structure of the correlation. Using the jet mass as an example, we have given arguments to explicitly explain the differences between jet observables initiated by each type of parton.

Finally, we remind the reader that the numerical results were derived for a particular color conguration (qq and gg events), in a particular implementation of the parton shower and hadronization. Color connections in more complex event congurations, or different Monte Carlo programs, may well exhibit somewhat different efciencies and rejection factors. The value of our results is that they indicate a subset of variables expected to be rich in information about the partonic origin of nal-state jets. These variables can be expected to act as valuable discriminants in searches for new physics, and could also be used to dene model-independent nal-state measurements which would nevertheless be sensitive to the short-distance physics of quark and gluon production.

6 Boosted W-tagging

In this section, we study the discrimination of a boosted, hadronically decaying W boson (signal) against a gluon-initiated jet background, comparing the performance of various groomed jet masses and substructure variables. A range of different distance parameters for the anti-kT jet algorithm are explored, in a range of different leading jet pT bins. This allows us to determine the performance of observables as a function of jet radius and jet boost, and to see where different approaches may break down. The groomed mass and substructure variables are then combined in a BDT as described in Sect. 4, and the performance of the resulting BDT discriminant explored through ROC curves to understand the degree

123

Eur. Phys. J. C (2015) 75 :409 Page 17 of 52 409

to which variables are correlated, and how this changes with jet boost and jet radius. Using BDT combinations of substructure variables to improve W tagging has been studied earlier in [61].

6.1 Methodology

These studies use the W W samples as signal and the dijet gg as background, described previously in Sect. 2. Whilst only gluonic backgrounds are explored here, the conclusions regarding the dependence of the performance and correlations on the jet boost and radius are not expected to be substantially different for quark backgrounds; we will see that the differences in the substructure properties of quark- and gluon-initiated jets, explored in the last section, are signicantly smaller than the differences between W-initiated and gluon-initiated jets.

As in the q/g tagging studies, the showered events were clustered with FastJet 3.03 using the anti-kT algorithm with jet radii of R = 0.4, 0.8, 1.2. In both signal and

background samples, an upper and lower cut on the leading jet pT is applied after showering/clustering, to ensure similar pT spectra for signal and background in each pT bin. The bins in leading jet pT that are considered are 300 400GeV, 500600GeV, 1.01.1 TeV, for the 300400GeV, 500600GeV, 1.01.1TeV parton pT slices respectively.

The jets then have various grooming algorithms applied and substructure observables reconstructed as described in Sect. 3.4. The substructure observables studied in this section are:

Ungroomed, trimmed (mtrim), and pruned (mprun) jet

masses.

Mass output from the modied mass drop tagger (mmmdt).

Soft drop mass with = 2 (msd).

2-point energy correlation function ratio C=12 (we also

studied = 2 but do not show its results because it

showed poor discrimination power).

N-subjettiness ratio 2/1 with = 1 (=121) and with

axes computed using one-pass kt axis optimization (we also studied = 2 but did not show its results because it

showed poor discrimination power).

Pruned Qjet mass volatility, Qjet.

6.2 Single variable performance

In this section we explore the performance of the various groomed jet mass and substructure variables in separating signal from background. Since we have not attempted to optimise the grooming parameter settings of each grooming algorithm, we do not place much emphasis here on the relative performance of the groomed masses, but instead con-

centrate on how their performance changes depending on the kinematic bin and jet radius considered.

Figure 10 compares the signal and background in terms of the different groomed masses explored for the anti-kT

R = 0.8 algorithm in the pT = 500600GeV bin. One can

clearly see that, in terms of separating signal and background, the groomed masses are signicantly more performant than the ungroomed anti-kT R = 0.8 mass. Using the same jet

radius and pT bin, Fig. 11 compares signal and background for the different substructure variables studied.

Figures 12, 13 and 14 show the single variable ROC curves for various pT bins and values of R. The single variable performance is also compared to the ROC curve for a BDT combination of all the variables (labelled allvars). In all cases, the allvars option is signicantly more performant than any of the individual single variables considered, indicating that there is considerable complementarity between the variables, and this is explored further in Sect. 6.3.

In Figs. 15, 16 and 17 the same information is shown in a format that more readily allows for a quantitative comparison of performance for different R and pT ; matrices are presented which give the background rejection for a signal efciency of 70%3 for single variable cuts, as well as twoand three-variable BDT combinations. The results are shown separately for each pT bin and jet radius considered. Most relevant for our immediate discussion, the diagonal entries of these plots show the background rejections for a single variable BDT using the labelled observable, and can thus be examined to get a quantitative measure of the individual single variable performance, and to study how this changes with jet radius and momenta. The off-diagonal entries give the performance when two variables (shown on the x-axis and on the y-axis, respectively) are combined in a BDT. The nal column of these plots shows the background rejection performance for three-variable BDT combinations of m=2sd + C=12 + X. These results will be discussed later in

Sect. 6.3.3.

In general, the most performant single variables are the groomed masses. However, in certain kinematic bins and for certain jet radii, C=12 has a background rejection that is comparable to or better than the groomed masses.

We rst examine the variation of performance with jet pT . By comparing Figs. 15a, 16a and 17b, we can see how the background rejection performance varies with increased momenta whilst keeping the jet radius xed to R = 0.8. Sim

ilarly, by comparing Figs. 15b, 16b and 17c we can see how performance evolves with pT for R = 1.2. For both R = 0.8

and R = 1.2 the background rejection power of the groomed

masses increases with increasing pT , with a factor 1.52.5

3 Note that we here choose to report the rejection for a higher signal efciency than the 50% that was used in the q/g tagging studies of Sect. 5, because the rejection rates in W tagging are considerably higher.

123

409 Page 18 of 52 Eur. Phys. J. C (2015) 75 :409

Wg, pT = 500-600 GeV, AK8

BOOST13WG

Wg, pT = 500-600 GeV, AK8

BOOST13WG

Wg, pT = 500-600 GeV, AK8

BOOST13WG

fraction of events

0.6

W jet gluon jet

0.6

W jet gluon jet

0.6

W jet gluon jet

0.4

0.2

0 0 50 100 150 200

m (GeV)

(GeV)

prun

(GeV)

trim

(a) (b) (c)

Wg, pT = 500-600 GeV, AK8

BOOST13WG

Wg, pT = 500-600 GeV, AK8

BOOST13WG

fraction of events

0.6

W jet gluon jet

0.6

W jet gluon jet

0.4

0.2

0 0 50 100 150 200

(GeV)

mmdt

=2 (GeV)

(d) (e)

Fig. 10 Leading jet mass distributions in the gg background and W W signal samples in the pT = 500600GeV bin using the anti-kT R = 0.8

algorithm

increase in rejection in going from the 300400 GeV to 1.01.1 TeV bins. In Fig. 18 we show the msd and mprun groomed masses for signal and background in the pT = 300400 and pT = 1.01.1TeV bins for R = 1.2 jets. Two effects result

in the improved performance of the groomed mass at high pT . Firstly, as is evident from the gure, the resolution of the signal peak after grooming improves, because the groomer nds it easier to pick out the hard signal component of the jet against the softer components of the underlying event when the signal is boosted. Secondly, it follows from Fig. 9 and the discussion in Sect. 5.4 that, for increasing pT , the perturbative shoulder of the gluon distribution decreases in size, and thus there is a slight decrease (or at least no increase) of the background contamination in the signal mass region (m/pT /R 0.5).

However, one can see from the Figs. 15b, 16b and 17c that the C=12, Qjet and =121 substructure variables behave somewhat differently. The background rejection power of the Qjet and =121 variables both decrease with increasing pT , by up to a factor two in going from the 300400GeVto

1.01.1TeV bins. Conversely the rejection power of C=12 dramatically increases with increasing pT for R = 0.8, but

does not improve with pT for the larger jet radius R = 1.2. In

Fig. 19 we show the =121 and C=12 distributions for signal and background in the pT 300400GeV and pT = 1.01.1 TeV bins for R = 0.8 jets. For =121 one can see that, in mov

ing from lower to higher pT bins, the signal peak remains fairly unchanged, whereas the background peak shifts to smaller =121 values, reducing the discriminating power of the variable. This is expected, since jet substructure methods explicitly relying on the identication of hard prongs would expect to work best at low pT , where the prongs would tend to be more separated. However, C=12 does not rely on the explicit identication of subjets, and one can see from Fig. 19 that the discrimination power visibly increases with increasing pT . This is in line with the observation in [44] that C=12 performs best when m/pT is small. The negative correlation between the discrimination power of Qjet and increasing pT can be understood in similar terms. As discussed in Sect. 5.4, the low volatility component of a gluon jet, the shoulder,

123

Eur. Phys. J. C (2015) 75 :409 Page 19 of 52 409

Wg, pT = 500-600 GeV, AK8

BOOST13WG

Wg, pT = 500-600 GeV, AK8

BOOST13WG

Wg, pT = 500-600 GeV, AK8

BOOST13WG

fraction of events

0.3

W jet gluon jet

0.4

W jet gluon jet

0.2

0.3

0.2

0.1

0 0 0.2 0.4 0.6 0.8 1

Qjet

(a)

(b)

(c)

Wg, pT = 500-600 GeV, AK8

BOOST13WG

Wg, pT = 500-600 GeV, AK8

BOOST13WG

fraction of events

0.08

W jet gluon jet

0.1

W jet gluon jet

0.06

0.04

0.05

0.02

0 0 0.2 0.4 0.6 0.8 1

(d)

(e)

Fig. 11 Leading jet substructure variable distributions in the gg background and W W signal samples in the pT = 500600 GeV bin using the anti-kT R = 0.8 algorithm

Wg, pT = 300-400 GeV, AK8 BOOST13WG

Wg, pT = 300-400 GeV, AK12 BOOST13WG

bkg

1 m =1

bkg

1 m

Qjet

-1

-2

trim

-2

trim

prun

mmdt

prun

mmdt

-3

allvars

-3

allvars

-4

10 0 0.2 0.4 0.6 0.8 1

-4

10 0 0.2 0.4 0.6 0.8 1

sig

(a) (b)

Fig. 12 ROC curves for single variables considered for W tagging in the pT = 300400GeV bin using the anti-kT R = 0.8 algorithm and R = 1.2

algorithm, along with a BDT combination of all variables (allvars)

is enhanced as pT increases leading to a background (QCD) volatility distribution more peaked at low values. In contrast the signal (W) jets will include more relatively soft radia-

tion as pT increases leading to a more volatile conguration. Thus, as pT increases, the signal jets will exhibit a somewhat broader volatility distribution, while the background jets will

123

409 Page 20 of 52 Eur. Phys. J. C (2015) 75 :409

Wg, pT = 500-600 GeV, AK8 BOOST13WG

Wg, pT = 500-600 GeV, AK12 BOOST13WG

bkg

1 m =1

bkg

1 m =1

Qjet

-1

-2

trim

-2

trim

mmdt

-3

allvars

prun

-3

allvars

prun

-4

10 0 0.2 0.4 0.6 0.8 1

sig

(a) (b)

Fig. 13 ROC curves for single variables considered for W tagging in the pT = 500600GeV bin using the anti-kT R = 0.8 algorithm and R = 1.2

algorithm, along with a BDT combination of all variables (allvars)

sig

Wg, pT = 1000-1100 GeV, AK4 BOOST13WG

Wg, pT = 1000-1100 GeV, AK8 BOOST13WG

bkg

1 m =1

bkg

1 m =1

Qjet

-1

-2

mmdt

trim

-2

mmdt

trim

prun

-3

allvars

-3

allvars

-4

10 0 0.2 0.4 0.6 0.8 1

sig

(a) (b)

1 m

Wg, pT = 1000-1100 GeV, AK12 BOOST13WG

bkg

Qjet

-1

-2

mmdt

trim

prun

-3

allvars

-4

10 0 0.2 0.4 0.6 0.8 1

sig

(c)

Fig. 14 ROC curves for single variables considered for W tagging in the pT = 1.01.1TeV bin using the anti-kT R = 0.4 algorithm, anti-kT

R = 0.8 algorithm and R = 1.2 algorithm, along with a BDT combination of all variables (allvars)

123

Eur. Phys. J. C (2015) 75 :409 Page 21 of 52 409

Wg, pT = 300-400 GeV, AK8 BOOST13WG

Wg, pT = 300-400 GeV, AK12 BOOST13WG

prun

mmdt

bkg. rejection (1/

prun

mmdt

bkg. rejection (1/

trim

Qjet

bkg

Qjet

bkg

)

fixed

= 0.70

sig

all variables, 1/

= 104.3

bkg

fixed

= 0.70

sig

all variables, 1/

= 83.1

bkg

=1 =1 Qjet

mmdt

prun

trim

=1 =1 Qjet

trim

mmdt

prun

(a) (b)

Fig. 15 The background rejection for a xed signal efciency (70%) of each BDT combination of each pair of variables considered, in the pT = 300400GeV bin using the anti-kT R = 0.8 algorithm and R = 1.2

algorithm. Also shown is the background rejection for three-variable combinations involving m=2sd + C=12, and for a BDT combination of

all of the variables considered

Wg, pT = 500-600 GeV, AK8 BOOST13WG

Wg, pT = 500-600 GeV, AK12 BOOST13WG

prun

mmdt

bkg. rejection (1/

prun

mmdt

bkg. rejection (1/

trim

Qjet

bkg

Qjet

bkg

)

fixed

= 0.70

sig

all variables, 1/

= 209.4

bkg

fixed

= 0.70

sig

all variables, 1/

= 148.9

bkg

=1 =1 Qjet

trim

prun

m sd

=1 =1 Qjet

trim

mmdt

prun

m sd

(a) (b)

Fig. 16 The background rejection for a xed signal efciency (70%) of each BDT combination of each pair of variables considered, in the pT = 500600GeV bin using the anti-kT R = 0.8 algorithm and R = 1.2

algorithm. Also shown is the background rejection for three-variable combinations involving m=2sd + C=12, and for a BDT combination of

all of the variables considered

exhibit a somewhat narrower volatility distribution, i.e., the distributions become more similar reducing the discriminating power of Qjet.

We now compare the performance of different jet radius parameters in the same pT bin by comparing the individual sub-gures of Figs. 15, 16 and 17. To within 25%, the

background rejection power of the groomed masses remains constant with respect to the jet radius. Figure 20 shows how the groomed mass changes for varying jet radius in the pT = 1.01.1TeV bin. One can see that the signal mass peak

remains unaffected by the increased radius, as expected, since grooming removes the soft contamination which could otherwise increase the mass of the jet as the radius increased. The gluon background in the signal mass region also remains largely unaffected, as follows from Fig. 9 and the discussion in Sect. 5.4, where it is shown that there is very little dependence of the groomed gluon mass distribution on R in the signal region (m/pT /R 0.5).

However, we again see rather different behaviour versus R for the substructure variables. In all pT bins considered,

123

409 Page 22 of 52 Eur. Phys. J. C (2015) 75 :409

Wg, pT = 1000-1100 GeV, AK4 BOOST13WG

Wg, pT = 1000-1100 GeV, AK8 BOOST13WG

prun

mmdt

bkg. rejection (1/

mmdt

bkg. rejection (1/

trim

bkg

Qjet

bkg

)

fixed

= 0.70

sig

all variables, 1/

= 696.8

bkg

fixed

= 0.70

sig

all variables, 1/

= 802.1

bkg

=1 =1 Qjet

trim

mmdt

prun

m sd

trim

=1 =1 Qjet

mmdt

prun

m sd

(a) (b)

Wg, pT = 1000-1100 GeV, AK12 BOOST13WG

prun

mmdt

bkg. rejection (1/

trim

bkg

Qjet

)

fixed

= 0.70

sig

all variables, 1/

= 500.2

bkg

=1 =1 Qjet

trim

mmdt

prun

m sd

(c)

Fig. 17 The background rejection for a xed signal efciency (70%) of each BDT combination of each pair of variables considered, in the pT = 1.01.1TeV bin using the anti-kT R = 0.4, R = 0.8 and R = 1.2

algorithm. Also shown is the background rejection for three-variable combinations involving m=2sd + C=12, and for a BDT combination of

all of the variables considered

the most performant substructure variable, C=12, performs best for an anti-kT distance parameter of R = 0.8. The

performance of this variable is dramatically worse for the larger jet radius of R = 1.2 (a factor seven worse back

ground rejection in the pT = 1.01.1 TeV bin), and substantially worse for R = 0.4. For the other jet substruc

ture variables considered, Qjet and =121, their background rejection power also reduces for larger jet radius, but not to the same extent. Figure 21 shows the =121 and C=12 distributions for signal and background in the pT = 1.01.1TeV bin for R = 0.8 and R = 1.2 jet radii. For the

larger jet radius, the C=12 distribution of both signal and background gets wider, and consequently the discrimination power decreases. For =121 there is comparatively little

change in the distributions with increasing jet radius. The increased sensitivity of C2 to soft wide angle radiation in comparison to 21 is a known feature of this variable [44], and a useful feature in discriminating coloured versus colour singlet jets. However, at very large jet radii (R 1.2), this

feature becomes disadvantageous; the jet can pick up a significant amount of initial state or other uncorrelated radiation, and C2 is more sensitive to this than is 21. This uncorrelated radiation has no (or very little) dependence on whether the jet is W- or gluon-initiated, and so sensitivity to this radiation means that the discrimination power will decrease. A similar description applies to the variable Qjet, and the story is very similar to that for Qjet with increasing pT . At larger R the low volatility shoulder is enhanced in the

123

Eur. Phys. J. C (2015) 75 :409 Page 23 of 52 409

Fig. 18 The Soft-drop = 2

and pruned groomed mass distribution for signal and background R = 1.2 jets in two

different pT bins

=2 (GeV)

Wg, pT = 300-400 GeV, AK12

BOOST13WG

Wg, pT = 1000-1100 GeV, AK12

BOOST13WG

fraction of events

0.6

W jet

gluon jet

0.6

W jet

gluon jet

0.4

0.2

0 0 50 100 150 200

=2 (GeV)

(a)

(b)

Wg, pT = 300-400 GeV, AK12

BOOST13WG

Wg, pT = 1000-1100 GeV, AK12

BOOST13WG

W jet

gluon jet

W jet

gluon jet

fraction of events

0.6

0.4

0.2

0 0 50 100 150 200

(GeV)

prun

(GeV)

prun

(c)

(d)

QCD background jet, leading to a narrower volatility distribution. For the W jet, the larger R includes more uncor-related radiation in the jet, leading to a broader volatility distribution. So, as with increasing pT , increasing R results in volatility distributions for signal and background jets that are more similar and Qjet exhibits reduced discrimination power.

6.3 Combined performance

Studying the improvement in performance (or lack thereof) when combining single variables into a multivariate analysis gives insight into the correlations among jet observables. The off-diagonal entries in Figs. 15, 16 and 17 can be used to compare the performance of different BDT two-variable combinations, and see how this varies as a function of pT and R. By comparing the background rejection achieved for the two-variable combinations to the background rejection of the all variables BDT, one can also understand how dis-

crimination can be improved by adding further variables to the two-variable BDTs.

In general the most powerful two-variable combinations involve a groomed mass and a non-mass substructure variable (C=12, Qjet or =121). Two-variable combinations of the substructure variables are not as powerful in comparison.

Which particular mass + substructure variable combination

is the most powerful depends strongly on the pT and R of the jet, as discussed in the sections to follow.

There is also modest improvement in the background rejection when different groomed masses are combined, indicating that there is complementary information between the different groomed masses (rst shown in [62]). In addition, there is an improvement in the background rejection when the groomed masses are combined with the ungroomed mass, indicating that grooming removes some useful discriminatory information from the jet. These observations are explored further in the section below.

Generally, the R = 0.8 jets offer the best two-variable

combined performance in all pT bins explored here. This is

123

409 Page 24 of 52 Eur. Phys. J. C (2015) 75 :409

Fig. 19 The =121 and C=12 distributions for signal and background R = 0.8 jets in two

different pT bins

Wg, pT = 300-400 GeV, AK8

BOOST13WG

Wg, pT = 1000-1100 GeV, AK8

BOOST13WG

fraction of events

0.08

W jet

gluon jet

0.08

W jet

gluon jet

0.06

0.04

0.02

0 0 0.2 0.4 0.6 0.8 1

(a)

(b)

Wg, pT = 300-400 GeV, AK8

BOOST13WG

Wg, pT = 1000-1100 GeV, AK8

BOOST13WG

fraction of events

0.2

W jet

gluon jet

W jet

gluon jet

0.3

0.15

0.2

0.1

0.05

0.1

0 0 0.2 0.4 0.6 0.8 1

(c)

(d)

despite the fact that in the highest pT = 1.01.1TeV bin the average separation of the quarks from the W decay is much smaller than 0.8, and well within 0.4. This conclusion could of course be susceptible to pile-up, which is not considered in this study. It is in marked contrast to the R dependence of the q/g tagging performance shown in Sect. 5, where a monotonic improvement in performance with reducing R is observed.

6.3.1 Mass + substructure performance

As already noted, the largest background rejection at 70% signal efciency are in general achieved using those two-variable BDT combinations which involve a groomed mass and a non-mass substructure variable. We now investigate the pT and R dependence of the performance of these combinations.

For both R = 0.8 and R = 1.2 jets, the rejection power of

these two-variable combinations increases substantially with increasing pT , at least within the pT range considered here.

For a jet radius of R = 0.8, across the full pT range con

sidered, the groomed mass + substructure variable combinations with the largest background rejection are those which involve C=12. For example, in combination with msd, this produces a 5-, 8- and 15-fold increase in background rejection compared to using the groomed mass alone. In Fig. 22 are shown 2-D histograms of msd versus C=12 for R = 0.8 jets

in the various pT bins considered, for both signal and background. The relatively low degree of correlation between msd versus C=12 that leads to these large improvements in background rejection can be seen. What little correlation exists is rather non-linear in nature, changing from a negative to a positive correlation as a function of the groomed mass, something which helps to improve the background rejection in the region of the W mass peak.

However, when we switch to a jet radius of R = 1.2 the

picture for C=12 combinations changes dramatically. These become signicantly less powerful, and the most powerful variable in groomed mass combinations becomes =121 for all jet pT considered. Figure 23 shows the correlation between m=2sd and C=12 in the pT = 1.01.1TeV bin for the various

123

Eur. Phys. J. C (2015) 75 :409 Page 25 of 52 409

Fig. 20 The soft-drop = 2

and pruned groomed mass distribution for signal and background R = 0.4 and

R = 1.2 jets in the pT = 1.01.1

TeV bin

=2 (GeV)

Wg, pT = 1000-1100 GeV, AK4

BOOST13WG

Wg, pT = 1000-1100 GeV, AK12

BOOST13WG

fraction of events

0.6

W jet

gluon jet

0.6

W jet

gluon jet

0.4

0.2

0 0 50 100 150 200

=2 (GeV)

(a)

(b)

Wg, pT = 1000-1100 GeV, AK4

BOOST13WG

Wg, pT = 1000-1100 GeV, AK12

BOOST13WG

W jet

gluon jet

W jet

gluon jet

fraction of events

0.6

0.4

0.2

0 0 50 100 150 200

(GeV)

prun

(GeV)

prun

(c)

(d)

jet radii considered. Figure 24 is the equivalent set of distributions for m=2sd and =121. One can see from Fig. 23 that, due to the sensitivity of the observable to soft, wide-angle radiation, as the jet radius increases C=12 increases and becomes more and more smeared out for both signal and background, leading to worse discrimination power. This does not happen to the same extent for =121. We can see from Fig. 24 that the negative correlation between m=2sd and =121 that is clearly visible for R = 0.4 decreases for larger jet radius,

such that the groomed mass and substructure variable are far less correlated and =121 offers improved discrimination within a m=2sd mass window.

6.3.2 Mass + mass performance

The different groomed masses and the ungroomed mass are of course not fully correlated, and thus one can always see some kind of improvement in the background rejection when two different mass variables are combined in the BDT. However, in some cases the improvement can be dramatic, particularly at higher pT , and particularly for combinations with

the ungroomed mass. For example, in Fig. 17 we can see that in the pT =1.01.1TeV bin, the combination of pruned mass with ungroomed mass produces a greater than eightfold improvement in the background rejection for R = 0.4

jets, a greater than vefold improvement for R = 0.8 jets,

and a factor 2 improvement for R = 1.2 jets. A simi

lar behaviour can be seen for mMDT mass. In Figs. 25, 26 and 27, we show the 2-D correlation plots of the pruned mass versus the ungroomed mass separately for the W W signal and gg background samples in the pT = 1.01.1TeV bin, for the various jet radii considered. For comparison, the correlation of the trimmed mass with the ungroomed mass, a combination that does not improve on the single mass as dramatically, is shown. In all cases one can see that there is a much smaller degree of correlation between the pruned mass and the ungroomed mass in the backgrounds sample than for the trimmed mass and the ungroomed mass. This is most obvious in Fig. 25, where the high degree of correlation between the trimmed and ungroomed mass is expected, since with the parameters used (in particular Rtrim = 0.2)

we cannot expect trimming to have a signicant impact on

123

409 Page 26 of 52 Eur. Phys. J. C (2015) 75 :409

Fig. 21 The =121 and C=12 distributions for signal and background R = 0.8 and

R = 1.2 jets in the pT = 1.01.1

TeV bin

Wg, pT = 1000-1100 GeV, AK8

BOOST13WG

Wg, pT = 1000-1100 GeV, AK12

BOOST13WG

fraction of events

0.08

W jet

gluon jet

W jet

gluon jet

0.06

0.04

0.02

0 0 0.2 0.4 0.6 0.8 1

(a)

(b)

Wg, pT = 1000-1100 GeV, AK8

BOOST13WG

Wg, pT = 1000-1100 GeV, AK12

BOOST13WG

fraction of events

W jet

gluon jet

W jet

gluon jet

0.3

0.1

0.2

0.05

0.1

0 0 0.2 0.4 0.6 0.8 1

(c)

(d)

an R = 0.4 jet. The reduced correlation with ungroomed

mass for pruning in the background means that, once we have required that the pruned mass is consistent with a W(i.e. 80 GeV), a relatively large difference between signal

and background in the ungroomed mass still remains, and can be exploited to improve the background rejection further. In other words, many of the background events which pass the pruned mass requirement do so because they are shifted to lower mass (to be within a signal mass window) by the grooming, but these events still have the property that they look very much like background events before the grooming. A requirement on the groomed mass alone does not exploit this property. Of course, the impact of pile-up, not considered in this study, could limit the degree to which the ungroomed mass could be used to improve discrimination in this way.

6.3.3 All variables performance

Figures 15, 16 and 17 report the background rejection achieved by a combination of all the variables considered into a single BDT discriminant. In all cases, the rejection

power of this all variables BDT is signicantly larger than the best two-variable combination. This indicates that, beyond the best two-variable combination, there is still signicant complementary information available in the remaining observables to improve the discrimination of signal and background. How much complementary information is available appears to be pT dependent. In the lower pT = 300400 and 500600GeV bins, the background rejection of the all variables combination is a factor 1.5 greater than the best

two-variable combination, but in the highest pT bin it is a factor 2.5 greater.

The nal column in Figs. 15, 16 and 17 allows us to further explore the all variables performance relative to the pair-wise performance. It shows the background rejection for three-variable BDT combinations of m=2sd+C=12+ X, where X is

the variable on the y-axis. For jets with R = 0.4 and R = 0.8,

the combination m=2sd + C=12 is (at least close to) the best

performant two-variable combination in every pT bin considered. For R = 1.2 this is not the case, as C=12 is superseded

by =121 in performance, as discussed earlier. Thus, in considering the three-variable combination results, it is simplest to focus on the R = 0.4 and R = 0.8 cases. Here we see that, for

123

Eur. Phys. J. C (2015) 75 :409 Page 27 of 52 409

Fig. 22 2-D histograms ofm=2sd versus C=12 distributions for R = 0.8 jets in the various

pT bins considered, shown separately for signal and background

200

0.06

200

W jet

gluon jet

0.008

150

0.04

0.006

100

0.004

0.02

0.002

0 0 0.2 0.4 0.6

(a)

200

0.08

200

0.006

W jet

gluon jet

150

0.06

150

0.004

100

0.04

100

0.002

0.02

0 0 0.2 0.4 0.6

(b)

200

0.1

W jet

gluon jet

150

0.004

100

0.05

100

0.002

0 0 0.2 0.4 0.6

(c)

the lower pT = 300400 and 500600GeV bins, adding the third variable to the best two-variable combination brings us to within 15% of the all variables background rejection.

However, in the highest pT = 1.01.1TeV bin, whilst adding the third variable does improve the performance considerably, we are still 40% from the observed all variables

background rejection, and clearly adding a fourth or maybe

even fth variable would bring considerable gains. In terms of which variable offers the best improvement when added to the m=2sd +C=12 combination, it is hard to see an obvious

pattern; the best third variable changes depending on the pT and R considered.

It appears that there is a rich and complex structure in terms of the degree to which the discriminatory infor-

123

409 Page 28 of 52 Eur. Phys. J. C (2015) 75 :409

Fig. 23 2-D histograms ofm=2sd versus C=12 for R = 0.4,

0.8 and 1.2 jets in the pT =1.01.1 TeV bin, shown separately for signal and background

200

0.15

W jet

gluon jet

150

0.01

0.1

100

0.005

0.05

0 0 0.2 0.4 0.6

(a)

200

0.1

200

W jet

gluon jet

150

0.004

100

0.05

100

0.002

0 0 0.2 0.4 0.6

(b)

200

0.003

W jet

0.03

gluon jet

150

0.002

0.02

100

0.01

0.001

0 0 0.2 0.4 0.6

(c)

mation provided by the set of variables considered overlaps, with the degree of overlap apparently decreasing at higher pT . This suggests that in all pT ranges, but especially at higher pT , there are substantial performance gains to be made by designing a more complex multivariate W tagger.

6.4 Conclusions

We have studied the performance, in terms of the separation of a hadronically decaying W boson from a gluon-initiated jet background, of a number of groomed jet masses, substructure variables, and BDT combinations of the above. We

123

Eur. Phys. J. C (2015) 75 :409 Page 29 of 52 409

Fig. 24 2-D histograms of m=2sd versus =121 for R = 0.4,

0.8 and 1.2 jets in the pT =1.01.1 TeV bin, shown separately for signal and background

200

0.006

W jet

0.04

gluon jet

150

0.03

0.004

100

0.02

0.002

0.01

0 0 0.2 0.4 0.6 0.8 1

(a)

200

0.04

200

0.004

W jet

gluon jet

150

0.03

150

0.003

100

0.02

100

0.002

0.01

0.001

0 0 0.2 0.4 0.6 0.8 1

(b)

200

W jet

0.03

gluon jet

150

0.003

0.02

100

0.002

0.01

0.001

0 0 0.2 0.4 0.6 0.8 1

(c)

have used this to gain insight into how the discriminatory information contained in the variables overlaps, and how this complementarity between the variables changes with jet pT and anti-kT distance parameter R.

In terms of the performance of individual variables, we nd that, in agreement with other studies [40], the groomed masses generally perform best, with a background rejection

power that increases with larger pT , but which is more consistent with respect to changes in R. We have explained the dependence of the groomed mass performance on pT and R using the understanding of the QCD mass distribution developed in Sect. 5.4. Conversely, the performance of other substructure variables, such as C=12 and =121, is more susceptible to changes in radius, with background rejec-

123

409 Page 30 of 52 Eur. Phys. J. C (2015) 75 :409

Fig. 25 2-D histograms of groomed mass versus ungroomed mass in the pT =

1.01.1 TeV bin using the anti-kT R = 0.4 algorithm,

shown separately for signal and background

200

prun

0.015

W jet

gluon jet

150

0.2

150

0.01

100

0.1

0.005

0 0 50 100 150 200

(a)

200

trim

0.015

0.4

W jet

gluon jet

150

0.3

0.01

100

0.2

0.005

0.1

0 0 50 100 150 200

(b)

tion power decreasing with increasing R. This is due to the inherent sensitivity of these observables to soft, wide angle radiation.

The best two-variable performance is obtained by combining a groomed mass with a substructure variable. Which particular substructure variable works best in combination strongly depends on pT and R. The variable C=12 offers signicant complementarity to groomed mass for the smaller values of R investigated (R = 0.4 and 0.8), owing to the

small degree of correlation between the variables. However, the sensitivity of C=12 to soft, wide-angle radiation leads to worse discrimination power at R = 1.2, where =121 per

forms better in combination. The best two-variable performance in each pT bin examined is obtained for C=12 in combination with a groomed mass, using R = 0.8, with a per

formance that is better at higher pT . Our studies also demonstrate the potential for enhancing discrimination by combining groomed and ungroomed mass information, although the use of ungroomed mass in this may be limited in practice by the presence of pile-up that is not considered in these studies.

By examining the performance of a BDT combination of all variables considered, it is clear that there are potentially substantial performance gains to be made by designing a more complex multivariate W tagger, especially at higher pT .

7 Top tagging

In this section, we investigate the identication of boosted top quarks using jet substructure. Boosted top quarks result in large-radius jets with complex substructure, containing a b-subjet and a boosted W. As a consequence of the many kinematic differences between top and QCD jets, top taggers are typically complex, with a couple of input parameters necessary for any given algorithm. We study the variation in performance of top tagging techniques with respect to jet pT and

R, re-optimizing the tagger inputs for each kinematic range and jet radius considered. We also investigate the effects of combining dedicated top tagging algorithms with other jet substructure variables, giving insight into the correlations among top-tagging variables.

123

Eur. Phys. J. C (2015) 75 :409 Page 31 of 52 409

Fig. 26 2-D histograms of groomed mass versus ungroomed mass in the pT =

1.01.1TeV bin using the anti-kT R = 0.8 algorithm,

shown separately for signal and background

200

prun

W jet

0.08

gluon jet

150

0.006

0.06

100

0.004

0.04

0.002

0.02

0 0 50 100 150 200

(a)

200

trim

0.1

trim

W jet

gluon jet

0.004

150

0.003

100

0.05

100

0.002

0.001

0 0 50 100 150 200

(b)

7.1 Methodology

We use the top quark MC samples for each bin described in Sect.2.2. The analysis relies on FastJet 3.0.3 for jet clustering and calculation of jet substructure variables. Jets are clustered using the anti-kT algorithm, and only the leading jet is used in each analysis. To ensure similar pT spectra in each bin an upper and lower pT cut are applied to each sample after jet clustering. The bins in leading jet pT for top tagging are 600700GeV, 11.1TeV , and 1.51.6 TeV. Jets are clustered with radii R = 0.4, 0.8, and 1.2; R = 0.4

jets are only studied in the 1.51.6 TeV bin because the top decay products are all contained within an R = 0.4 jet for

top quarks with this boost.

We study a number of top-tagging strategies, which can be divided into two distinct categories. In the rst category are dedicated top-tagging algorithms, which aim to directly reconstruct the top and W candidates in the top decay. In particular, we study:

1. HEPTopTagger2. Johns Hopkins Tagger (JH)

3. Trimming with W-identication4. Pruning with W-identication

as described in Sect. 3.3. In the case of the HepTopTagger and JH tagger, the algorithms produce three output variables (mt, mW and helicity angle) that can be used to discriminate top jets from QCD. The trimming and pruning algorithms as used here produce two outputs, mt and mW . All of the above taggers and groomers incorporate a step to remove contributions from the underlying event and other soft radiation to the reconstructed mt and mW , and also explicitly rejects jets that do not meet basic selection criteria, as explained in detail in Sect. 3.3.

In the second category are individual jet substructure variables that are sensitive to the radiation pattern within the jet, which we refer to as jet-shape variables. While the most sensitive top-tagging variables are typically sensitive to three-pronged radiation, we also consider variables sensitive to two-pronged radiation in the limit where the W is very boosted and its subjets overlap. The variables we consider are:

123

409 Page 32 of 52 Eur. Phys. J. C (2015) 75 :409

Fig. 27 2-D histograms of groomed mass versus ungroomed mass in the pT =

1.01.1TeV bin using the anti-kT R = 1.2 algorithm,

shown separately for signal and background

200

prun

W jet

gluon jet

0.006

150

0.02

150

0.004

100

0.01

0.002

0 0 50 100 150 200

(a)

200

trim

W jet

gluon jet

0.03

0.004

150

0.003

0.02

100

0.002

0.01

0.001

0 0 50 100 150 200

(b)

BOOST13WG

bkg

(JH)

-1

-2

-1

-2

(HEP)

(trim)

-3

(prune)

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a) Jet shapes (b) top mass

Fig. 28 Comparison of single-variable top-tagging performance in the pT = 11.1 GeV bin using the anti-kT , R = 0.8 algorithm

The ungroomed jet mass. N-subjettiness ratios =121 and =132, using the winner-

takes-all axes denition.

2-point energy correlation function ratios C=12 and C=13.

The pruned Qjet mass volatility, Qjet.

Several of these variables were also considered earlier for q/g-tagging and W-tagging.

To study the correlations amongst the above substructure variables and tagging algorithms, we combine the relevant

123

Eur. Phys. J. C (2015) 75 :409 Page 33 of 52 409

BOOST13WG

fraction of events

0.24

0.2

0.35

top QCD

0.25

top QCD

0.22

top QCD

0.18

top QCD

0.2

0.3

0.2

0.18

fraction of events

0.16

0.14

0.25

0.16

0.14

0.12

0.2

0.15

0.12

0.1

0.15

0.1

0.08

0.1

0.08

0.1

0.06

0.05

0.04

0.05

0.04

0.02

0 0 200 400 600 800 1000

JH m

(GeV)

HEP m

(GeV)

JH m

(GeV)

HEP m

(GeV)

(a) (b) (c) (d)

BOOST13WG

0.45

0.3

0.4

top QCD

0.3

top QCD

0.25

top QCD

0.25

0.35

0.25

0.2

0.3

0.2

0.25

0.15

0.2

0.15

0.1

0.05

0 0 200 400 600 800 1000

prun mass (GeV)

trim mass (GeV)

prun mass (GeV)

trim mass (GeV)

(e) (f) (g) (h)

Fig. 29 Comparison of top mass reconstruction with the Johns Hopkins (JH), HEPTopTaggers (HEP), pruning, and trimming at different R using the anti-kT algorithm in the pT = 1.51.6 TeV bin. Each histogram is shown for the working point optimized for best performance

with mt in the 0.30.35 signal efciency bin, and is normalized to the fraction of events passing the tagger. In this and subsequent plots, the HEPTopTagger distribution cuts off at 500GeVbecause the tagger fails to tag jets with a larger mass

BOOST13WG

0.3

fraction of events

0.25

0.22

top QCD

0.25

top QCD

0.2

top QCD

0.25

0.18

0.2

0.16

0.14

0.15

0.12

0.1

0.08

0.06

0.05

0.04

0.02

0 0 200 400 600 800 1000

JH m

(GeV)

HEP m

(GeV)

JH m

(GeV)

HEP m

(GeV)

(a) (b) (c) (d)

BOOST13WG

0.4

0.35

top QCD

0.35

top QCD

0.3

top QCD

0.25

0.3

0.25

0.2

0.25

0.2

0.15

0.1

0.05

0 0 200 400 600 800 1000

(GeV)

prune

(GeV)

trim

(GeV)

prune

(GeV)

trim

(e) (f) (g) (h)

Fig. 30 Comparison of top mass reconstruction with the Johns Hopkins (JH), HEPTopTaggers (HEP), pruning, and trimming at different pT using the anti-kT algorithm, R = 0.8. Each histogram is shown for

the working point optimized for best performance with mt in the 0.3 0.35 signal efciency bin, and is normalized to the fraction of events passing the tagger

123

409 Page 34 of 52 Eur. Phys. J. C (2015) 75 :409

BOOST13WG

bkg

> 1.5 TeV

-1

-2

> 1.0 TeV

-2

> 1.0 TeV

-3

> 0.6 TeV

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a)

(b)

BOOST13WG

bkg

> 1.5 TeV

-1

-2

> 1.0 TeV

-2

> 1.0 TeV

-3

> 0.6 TeV

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(c)

(d)

BOOST13WG

bkg

> 1.5 TeV

-1

-2

> 1.0 TeV

-3

> 0.6 TeV

-4

10 0.1 0.2 0.3 0.4 0.5 0.6

0.7 0.8 0.9 1

sig

(e)

Fig. 31 Comparison of individual jet shape performance at different pT using the anti-kT R = 0.8 algorithm

tagger output variables and/or jet shapes into a BDT4, as described in Sect. 4. Additionally, because each tagger has two input parameters, we scan over reasonable values of the

4 Similar studies were recently performed for the HepTopTagger in [63, 64], in the context of trying to improve the tagger by combining its outputs with N-subjettiness.

input parameters to determine the optimal value that gives the largest background rejection for each top tagging signal efciency. This allows a direct comparison of the optimized version of each tagger. The input parameter values scanned for the various algorithms are:

HEPTopTagger m [30, 100]GeV, [0.5, 1]

123

Eur. Phys. J. C (2015) 75 :409 Page 35 of 52 409

BOOST13WG

bkg

> 1.5 TeV

-1

-2

> 1.0 TeV

-2

> 1.0 TeV

-3

> 0.6 TeV

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a) (b)

BOOST13WG

bkg

> 1.5 TeV

-1

-2

> 1.0 TeV

-2

> 1.0 TeV

-3

> 0.6 TeV

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

Fig. 32 Comparison of top mass performance of different taggers at different pT using the anti-kT R = 0.8 algorithm

BOOST13WG

0.2

0.08

0.1

0.07

0.18

top QCD

0.07

top QCD

0.06

top QCD

0.16

0.08

0.06

0.14

0.05

0.12

0.05

0.06

0.04

0.1

0.04

0.08

0.03

0.04

0.03

0.06

0.02

0.04

0.02

0.01

0 0 0.1 0.2 0.3 0.4 0.5

0 0 0.2 0.4 0.6 0.8 1

Qjet

(a) (b) (c) (d)

Fig. 33 Comparison of Qjet and =132 at R = 0.8 and different values of the pT . These shape variables are the most sensitive to varying pT

JH Tagger p [0.02, 0.15], R [0.07, 0.2] Trimming fcut [0.02, 0.14], Rtrim [0.1, 0.5] Pruning zcut [0.02, 0.14], Rcut [0.1, 0.6]

We also investigate the degradation in performance of the top-tagging variables when moving away from the optimal parameter choice.

7.2 Single variable performance

We begin by investigating the behaviour of individual jet substructure variables. Because of the rich, three-pronged structure of the top decay, it is expected that combinations of masses and jet shapes will far outperform single variables in identifying boosted tops. However, a study

123

409 Page 36 of 52 Eur. Phys. J. C (2015) 75 :409

BOOST13WG

bkg

R = 1.2

R = 0.8

R = 0.4

R = 1.2

R = 0.8

R = 0.4

-1

-2

-3

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a)

(b)

BOOST13WG

bkg

R = 1.2

R = 0.8

R = 0.4

R = 1.2

R = 0.8

R = 0.4

-1

-2

-3

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(c)

(d)

BOOST13WG

bkg

R = 1.2

R = 0.8

R = 0.4

-1

-2

-3

-4

10 0.1 0.2 0.3 0.4 0.5

0.6 0.7 0.8 0.9 1

sig

(e)

Fig. 34 Comparison of individual jet shape performance at different R in the pT = 1.51.6TeV bin

of the top-tagging performance of single variables facilitates a direct comparison with the W tagging results in Sect. 6, and also allows a straightforward examination of the performance of each variable for different pT and jet radius.

Top-tagging performance is quantied using ROC curves. Figure 28 shows the ROC curves for each of the top-tagging variables, with the bare (ungroomed) jet mass also plotted for comparison. The jet-shape variables all perform substantially worse than ungroomed jet mass; this is

123

Eur. Phys. J. C (2015) 75 :409 Page 37 of 52 409

BOOST13WG

bkg

R = 1.2

R = 0.8

R = 0.4

R = 1.2

R = 0.8

R = 0.4

-1

-2

-3

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a) (b)

BOOST13WG

bkg

R = 1.2

R = 0.8

R = 0.4

R = 1.2

R = 0.8

R = 0.4

-1

-2

-3

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

Fig. 35 Comparison of top mass performance of different taggers at different R in the pT = 1.51.6 TeV bin

in contrast with W tagging, for which several variables are competitive with or perform better than ungroomed jet mass (see, for example, Figs. 16a, 17a, b). To understand why this is the case, consider N-subjettiness: the W is two-pronged and the top is three-pronged, and so we expect 21 and 32 to be the best-performant N-subjettiness ratios, respectively. However, a cut selection small values of 21 necessarily selects for events with large 1, which is strongly correlated with jet mass, up to exponentially suppressed contributions. Therefore, 21 applied to

W-tagging indirectly incorporates some information about the jet mass in addition to shape information. By contrast, 32 applied to top tagging does not include any information on the ungroomed jet mass information. This likely accounts for why, relative to a cut on ungroomed mass, 32 for top tagging performs substantially worse than 21 for W-tagging.

Of the two top-tagging algorithms, it is apparent from Fig. 28 that the Johns Hopkins tagger out-performs the HEPTopTagger in terms of its background rejection at xed signal efciency for both the top and W candidate masses; this is

expected, as the HEPTopTagger was designed to reconstruct moderate-pT top jets in tt H events (for a proposed high-pT variant of the HEPTopTagger, see [65]). In Fig. 29, we show the histograms for the top mass output from the JH and HEPTopTagger for different R in the pT = 1.51.6 TeV bin, and in Fig. 30 for different pT at R = 0.8, optimized

at a signal efciency of 30%. A particular feature of the HepTopTagger algorithm is that, after the jet is ltered to select the ve hardest subjets, the three subjets are chosen which most closely reconstruct the top mass. This requirement tends to shape a peak in the QCD background around mt for the HEPTopTagger, as can be seen from Figs. 29d and 30d; this is the likely reason for the better performance of the JH tagger, which has no such requirement. This effect is more pronounced at higher pT and larger jet radius (see

Figs. 32, 35). It has been proposed [63,64] that performance of the HEPTopTagger may be improved by changing the selection criteria and/or performing a multivariate analysis with other variables. For example, the three subjets reconstructing the top should be selected only among those sets that pass the W mass constraints, which reduces the shap-

123

409 Page 38 of 52 Eur. Phys. J. C (2015) 75 :409

BOOST13WG

0.24

0.18

0.06

0.22

top QCD

0.06

top QCD

0.16

top QCD

0.2

0.05

0.18

0.05

0.14

0.16

0.12

0.04

0.14

0.04

0.1

0.12

0.03

0.08

0.03

0.1

0.08

0.02

0.06

0.02

0.06

0.04

0.01

0.02

0 0 0.2 0.4 0.6 0.8 1

(a) (b) (c) (d)

BOOST13WG

0.035

top QCD

0.04

top QCD

0.07

top QCD

0.06

top QCD

0.03

0.035

0.06

0.05

0.03

0.025

0.05

0.025

0.04

0.02

0.04

0.02

0.03

0.015

0.03

0.015

0.02

0.01

0.02

0.005

0.01

0 0 0.2 0.4 0.6 0.8 1

(e) (f) (g)

(h)

BOOST13WG

0.22

0.08

0.2

top QCD

0.07

top QCD

0.18

0.16

0.06

0.14

0.05

0.12

0.1

0.04

0.08

0.03

0.06

0.02

0.04

0.02

0.01

0 0 0.1 0.2 0.3 0.4 0.5

Qjet

(i) (j)

Fig. 36 Comparison of various shape variables in the pT = 1.51.6TeV bin and different values of the anti-kT radius R

ing of the background. We indeed conrm below that combining the HEPTopTagger with other variables reduces the discrepancy between the JH and the HEPTopTagger, and a preliminary study indicates that the new ordering prescriptions makes the tagger performances more comparable.

We also see in Fig. 28b that the top mass from the JH tagger and the HEPTopTagger has superior performance relative to either of the grooming algorithms; this is because the pruning and trimming algorithms do not have inherent W-identication steps and are not optimized for this purpose. Indeed, because of the lack of a W-identication step, grooming algorithms are forced to strike a balance between under-grooming the jet, which broadens the signal peak due to underlying event contamination and features a larger background rate, and over-grooming the jet, which occasionally throws out the b-jet and preserves only the W components inside the jet. We demonstrate this effect in Figs. 29 and 30,

sig

BOOST13WG

bkg

+ m

(prune)

-1

+ m

(trim)

HEP

JH + HEP

-2

-3

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fig. 37 The performance of the various taggers in the pT = 1 1.1

TeV bin using the anti-kT R = 0.8 algorithm. For the groomers a BDT

combination of the reconstructed mt and mW are used. Also shown is a multivariable combination of all of the JH and HEPTopTagger outputs. The ungroomed mass performance is shown for comparison

123

Eur. Phys. J. C (2015) 75 :409 Page 39 of 52 409

BOOST13WG

HEP

bkg

-1

HEP +

HEP + C

HEP +

HEP + shape

JH +

JH + C

JH +

JH + shape

-2

-3

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a) (b)

BOOST13WG

bkg

-1

HEP

JH + HEP

JH + shape

HEP + shape

-2

-3

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(c)

Fig. 38 The performance of BDT combinations of the JH and HepTop-Tagger outputs with various shape variables in the pT = 11.1 TeV bin

using the anti-kT R = 0.8 algorithm. Taggers are combined with the fol-

lowing shape variables: =121+=132, C=12+C=13, Qjet, and all of the

above (denoted shape)

showing that with 30% signal efciency, the optimal performance of the tagger over-grooms a substantial fraction of the jets (2030%), leading to a spurious second peak at

mW . This effect is more pronounced at large R and pT , since more aggressive grooming is required in these limits to combat the increased contamination from underlying event and QCD radiation.

In Figs. 31 and 32 we directly compare ROC curves for jet-shape variable performance and top-mass performance, respectively, in three different pT bins whilst keeping the jet radius xed at R = 0.8. The input parameters of the

taggers, groomers and shape variables are separately optimized in each pT bin. One can see from Fig. 31 that the tagging performance of jet shapes do not change substantially with pT . The variables =132 and Qjet have the most variation and tend to degrade with higher pT , as can be seen in Fig. 33. This was also observed in the W-tagging studies in Sect. 6, and makes sense, as higher-pT QCD jets have more, harder emissions within the jet, giving rise to sub-

structure that fakes the signal. For the variable Qjet (again as discussed in Sect. 6) increasing pT leads to QCD jets with a narrower volatility distribution due to the enhanced contribution of the shoulder region, while for the signal (top) jets the increased amount of soft radiation with increasing pT results in a broader volatility distribution. This with increasing pT the signal and background jets exhibit more similar volatility distributions, as we see explicitly in Fig. 33a, b. Thus Qjet becomes less discriminant for top identication as pT increases. By contrast, from Fig. 32 we can see that most of the top-mass variables have superior performance at higher pT , due to the radiation from the top quark becoming more collimated. The notable exception is the HEPTopTagger, which degrades at higher pT , likely in part due to the background-shaping effects studied above and which is at least partially mitigated by recent updates to the HEPTop-Tagger [63,64].

In Figs. 34 and 35 we directly compare ROC curves for jet-shape variable performance and top-mass performance,

123

409 Page 40 of 52 Eur. Phys. J. C (2015) 75 :409

BOOST13WG

bkg

+ m

(prune)

+ m

(trim)

-1

+ m

(prune)

+ m

(trim)

-2

+ m

+ C (prune)

+ m

+ C (trim)

-3

+ m

(prune)

-3

+ m

(trim)

+ m

+ shape (prune)

+ m

+ shape (trim)

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a) (b)

BOOST13WG

bkg

-1

+ m

(prune)

-2

+ m

(trim)

+ m

+ shape (prune)

-3

+ m

+ shape (trim)

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(c)

Fig. 39 The performance of the BDT combinations of the trimming and pruning outputs with various shape variables in the pT = 1 1.1

TeV bin using the anti-kT R = 0.8 algorithm. Groomer mass out-

puts are combined with the following shape variables: =121 + =132, C=12 + C=13, Qjet, and all of the above (denoted shape)

BOOST13WG

bkg

JH + shape

HEP + shape

-1

-2

+ m

+ shape (prune)

-3

+ m

+ shape (trim)

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

Fig. 40 Comparison of the performance of the BDT combinations of all the groomer/tagger outputs with all the available shape variables in the pT = 1 1.1 TeV bin using the anti-kT R = 0.8 algorithm.

Tagger/groomer outputs are combined with all of the following shape variables: =121 + =132, C=12 + C=13, Qjet

respectively, for three different jet radii within the pT =1.51.6 TeV bin. Again, the input parameters of the taggers, groomers and shape variables are separately optimized for each jet radius. We can see from these gures that most of the top-tagging variables, both shape and reconstructed top mass, perform best for smaller radius, as was generally observed in the case of W-tagging in Sect. 6. This is likely because, at such high pT , most of the radiation from the top quark is conned within R = 0.4, and having a larger jet

radius makes the variable more susceptible to contamination from the underlying event and other uncorrelated radiation. In Fig. 36, we compare the individual top signal and QCD background distributions for each shape variable considered in the pT = 1.51.6 TeV bin for the various jet radii. In Fig. 36a

h the distributions for both signal and background broaden with increasing R, degrading the discriminating power. For C=12 and C=13, the background distributions are shifted to larger values as well. For the variable Qjet, as already

123

Eur. Phys. J. C (2015) 75 :409 Page 41 of 52 409

BOOST13WG

bkg

> 1.5 TeV

-1

-2

> 1.0 TeV

-2

> 1.0 TeV

-3

> 0.6 TeV

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a) (b)

BOOST13WG

bkg

> 1.5 TeV

-1

-2

> 1.0 TeV

-2

> 1.0 TeV

-3

> 0.6 TeV

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

Fig. 41 Comparison at different pT of the performance of various top tagging/grooming algorithms using the anti-kT R = 0.8 algorithm. For

each tagger/groomer, all output variables are combined in a BDT

discussed for increasing pT (and in Sect. 6) the behavior with increasing R is a bit more complicated, with the QCD jets becoming less volatile and the signal jets more volatile,i.e., the two volatility distributions become more similar as we move from Fig. 36i, j. So again the discriminating power decreases with increasing R. The main exception is for C=13, which performs optimally at R = 0.8; in this case, the sig

nal and background coincidentally happen to have the same distribution around R = 0.4, and so R = 0.8 gives better

discrimination.

7.3 Performance of multivariable combinations

We now consider various BDT combinations of the single variables considered in the last section, using the techniques described in Sect. 4. In particular, we consider the performance of individual taggers such as the JH tagger and HEPTopTagger, which output information about the top and W candidate masses and the helicity angle; for each tagger, all three output variables are combined in a BDT. For trimming and pruning, the output candidate mW and mt are combined

in a BDT. Finally, we consider the combination of the full set of outputs of each of the above taggers/groomers with the shape variables, as well also a combination of the outputs of the HEPTopTagger and JH tagger. This allows us to determine the degree of complementary information in taggers/groomers and shape variables, as well as between the top tagging algorithms themselves. For all variables with tuneable input parameters, we scan and optimize over realistic values of such parameters, as described in Sect. 7.1.

In Fig. 37, we directly compare the performance of the HEPTopTagger, the JH tagger, trimming, and pruning, in the pT = 11.1 TeV bin with R = 0.8, where both mt and

mW are used in the groomers. Generally, we nd that pruning, which does not naturally incorporate subjets into the algorithm, does not perform as well as the others. Interestingly, trimming, which does include a subjet-identication step, performs comparably to the standard HEPTopTagger over much of the range, possibly due to the background-shaping observed in Sect. 7.2, although this can change with recent proposed updates to the HEPTopTagger [63,64]. By contrast, the JH tagger outperforms the other standard algo-

123

409 Page 42 of 52 Eur. Phys. J. C (2015) 75 :409

BOOST13WG

bkg

> 1.5 TeV

-1

-2

> 1.0 TeV

-2

> 1.0 TeV

-3

> 0.6 TeV

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a) (b)

BOOST13WG

bkg

> 1.5 TeV

-1

-2

> 1.0 TeV

-2

> 1.0 TeV

-3

> 0.6 TeV

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

Fig. 42 Comparison at different pT of the performance of the JH tagger using the anti-kT R = 0.8 algorithm, where all tagger output variables

are combined in a BDT with various shape variables

rithms. To determine whether there is complementary information in the mass outputs from different top taggers, we also consider in Fig. 37a multivariable combination of all of the JH and HEPTopTagger outputs. The maximum efciency of the combined JH and HEPTopTaggers is limited, as some fraction of signal events inevitably fails either one or other of the taggers. We do see a 2050% improvement in performance when combining all outputs, which suggests that the different algorithms used to identify the top and W for different taggers contains complementary information.

In Fig. 38 we present the results for multivariable combinations of the top tagger outputs with and without shape variables. We see that, for both the HEPTopTagger and the JH tagger, the shape variables contain additional information uncorrelated with the masses and helicity angle, and give on average a factor 23 improvement in signal discrimination.We see that, when combined with the tagger outputs, both the energy correlation functions C2 + C3 and the N-subjettiness

ratios 21 + 32 give comparable performance, while Qjet is

slightly worse; this is unsurprising, as Qjets accesses shape information in a more indirect way from other shape vari-

ables. Combining all shape variables with a single top tagger provides even greater enhancement in discrimination power. We directly compare the performance of the JH and HEPTopTaggers in Fig. 38c. Combining the taggers with shape information nearly erases the difference between the tagging methods observed in Fig. 37; this indicates that combining the shape information with the HEPTopTagger identies the differences between signal and background missed by the standard tagger alone. This also suggests that further improvement to discriminating power may be minimal, as various multivariable combinations converge to within a factor of 20% or so.

In Fig. 39 we present the results for multivariable combinations of groomer outputs with and without shape variables. As with the tagging algorithms, combinations of groomers with shape variables improves their discriminating power; combinations with 32 + 21 perform comparably to those

with C3 + C2, and both of these are superior to combi

nations with the mass volatility, Qjet. Substantial further improvement is possible by combining the groomers with all shape variables. Not surprisingly, the taggers that lag behind

123

Eur. Phys. J. C (2015) 75 :409 Page 43 of 52 409

BOOST13WG

bkg

> 1.5 TeV

-1

-2

> 1.0 TeV

-2

> 1.0 TeV

-3

> 0.6 TeV

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a) (b)

BOOST13WG

bkg

> 1.5 TeV

-1

-2

> 1.0 TeV

-2

> 1.0 TeV

-3

> 0.6 TeV

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

Fig. 43 Comparison at different pT of the performance of the HEPTopTagger using the anti-kT R = 0.8 algorithm, where all tagger output

variables are combined in a BDT with various shape variables

in performance enjoy the largest gain in signal-background discrimination with the addition of shape variables. Once again, in Fig. 39c, we nd that the differences between pruning and trimming are erased when combined with shape information.

Finally, in Fig. 40, we compare the performance of each of the tagger/groomers when their outputs are combined with all of the shape variables considered. One can see that the discrepancies between the performance of the different taggers/groomers all but vanishes, suggesting perhaps that we are here utilising all available signal-background discrimination information, and that this is the optimal top tagging performance that could be achieved in these conditions.

Up to this point, we have considered only the combined multivariable performance in the pT = 1.01.1 TeV bin with jet radius R = 0.8. We now compare the BDT combinations

of tagger outputs, with and without shape variables, at different pT . The taggers are optimized over all input parameters for each choice of pT and signal efciency. As with the single-variable study, we consider anti-kT jets clustered

with R = 0.8 and compare the outcomes in the pT = 500

600GeV, pT = 11.1 TeV, and pT = 1.51.6TeV bins. The comparison of the taggers/groomers is shown in Fig. 41. The behaviour with pT is qualitatively similar to the behaviour of the mt variable for each tagger/groomer shown in Fig. 32; this suggests that the pT behaviour of the taggers is dominated by the top-mass reconstruction. As before, the standard HEPTopTagger performance degrades slightly with increased pT due to the background shaping effect (which may be mitigated by recently proposed updates), while the JH tagger and groomers modestly improve in performance.

In Fig. 42, we show the pT -dependence of BDT combinations of the JH tagger output combined with shape variables. In terms of pT dependence, we nd that the curves look nearly identical to Fig. 41b: the pT dependence is again dominated by the top-mass reconstruction, and combining the tagger outputs with different shape variables does not substantially change this behavior. Although not shown here, the same behavior is observed for trimming and pruning. By contrast, the pT dependence of the HEPTopTagger ROC curves,

123

409 Page 44 of 52 Eur. Phys. J. C (2015) 75 :409

BOOST13WG

bkg

R = 1.2

R = 0.8

R = 0.4

R = 1.2

R = 0.8

R = 0.4

-1

-2

-3

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a) (b)

BOOST13WG

bkg

R = 1.2

R = 0.8

R = 0.4

R = 1.2

R = 0.8

R = 0.4

-1

-2

-3

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

Fig. 44 Comparison at different radii of the performance of various top tagging/grooming algorithms with pT = 1.51.6 TeV. For each tagger/groomer, all output variables are combined in a BDT

shown in Fig. 43, does change somewhat when combined with different shape variables; due to the suboptimal performance of the HEPTopTagger at high pT in the conventional conguration, we nd that combining the HEPTopTagger with C=13, which in Fig. 31b is seen to have some modest improvement at high pT , can improve its performance.Combining the standard HEPTopTagger with multiple shape variables gives the maximum improvement in performance at high pT relative to at low pT .

In Fig. 44 we compare the BDT combinations of tagger outputs, with and without shape variables, at different jet radius R in the pT = 1.51.6TeV bin. The taggers are optimized over all input parameters for each choice of R and signal efciency. We nd that, for all taggers and groomers, the performance is always best at small R; the choice of R is sufciently large to admit the full top quark decay at such high pT , but is small enough to suppress contamination from additional radiation. This is not altered when the taggers are combined with shape variables. For example, in Fig. 45 is shown the dependence on R of the JH

tagger when combined with shape variables, where one can see that the R-dependence is identical for all combinations. The same holds true for the HEPTopTagger, trimming, and pruning.

7.4 Performance at sub-optimal working points

Up until now, we have re-optimized our tagger and groomer parameters for each pT , R, and signal efciency working point. In reality, experiments will choose a nite set of working points to use. When this is taken into account, how will the top-tagging performance compare to the optimal results already shown? To address this concern, we replicate our analyses, but optimize the top taggers only for a single pT bin, single jet radius R, or single signal efciency, and subsequently apply the same parameters to other scenarios. This allows us to determine the extent to which re-optimization is necessary to maintain the high signal-to-background discrimination power seen in the top-tagging algorithms we studied. In this section, we focus on the taggers and groomers, and

sig

123

Eur. Phys. J. C (2015) 75 :409 Page 45 of 52 409

BOOST13WG

bkg

R = 1.2

R = 0.8

R = 0.4

R = 1.2

R = 0.8

R = 0.4

-1

-2

-3

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a) (b)

BOOST13WG

bkg

R = 1.2

R = 0.8

R = 0.4

R = 1.2

R = 0.8

R = 0.4

-1

-2

-3

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

Fig. 45 Comparison at different radii of the performance of the JH tagger in the pT = 1.51.6 TeV bin, where all tagger output variables are combined in a BDT with various shape variables

their combination with shape variables, as the shape variables alone typically do not have any input parameters to optimize.

Optimizing at a single pT : We show in Fig. 46 the performance of the reconstructed top mass for the pT = 0.60.7TeV and pT = 1.01.1TeV bins, with all input parameters optimized to the pT = 1.51.6TeV bin (and R = 0.8

throughout). This is normalized to the performance using the optimized tagger inputs at each pT . The performance degradation is at the level of 2030% (at maximum 50%) when the high-pT optimized inputs are used at other momenta, with trimming and the Johns Hopkins tagger degrading the most.The jagged behaviour of the points is due to the nite resolution of the scan. We also observe a particular effect associated with using suboptimal taggers: since taggers sometimes fail to return a top candidate, parameters optimized for a particular signal efciency sig at pT = 1.51.6TeV may not return enough signal candidates to reach the same efciency at a different pT . Consequently, no point appears for

that pT value. This is not often a practical concern, as the largest gains in signal discrimination and signicance are for smaller values of sig, but it may be an important effect to consider when selecting benchmark tagger parameters and signal efciencies.

The degradation in performance is more pronounced for the BDT combinations of the full tagger outputs, shown in Fig. 47. This is true particularly at very low signal efciency, where the optimization of inputs picks out a cut on the tail of some distribution that depends precisely on the pT /R of the jet. Once again, trimming and the Johns Hopkins tagger degrade more markedly. Similar behavior holds for the BDT combinations of tagger outputs plus all shape variables.

Optimizing at a single R In Fig. 48, we show the performance of the reconstructed top mass for R = 0.4 and 0.8, with all

input parameters optimized to R = 1.2 TeV bin (and pT =

1.51.6TeV throughout). This is normalized to the performance using the optimized tagger inputs at each R. While

123

409 Page 46 of 52 Eur. Phys. J. C (2015) 75 :409

BOOST13WG

bkd, optimized

bkg

2.4

2.2

> 1.0 TeV

bkd, optimized

bkg

2.2

> 1.0 TeV

1.8

1.6

1.4

1.2

> 0.6 TeV

1.2

> 0.6 TeV

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a) (b)

BOOST13WG

bkd, optimized

bkg

2.4

2.2

> 1.0 TeV

bkd, optimized

bkg

2.2

> 1.0 TeV

1.8

1.6

1.4

1.2

> 0.6 TeV

1.2

> 0.6 TeV

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

Fig. 46 Comparison of the top mass performance of different taggers at different pT using the anti-kT R = 0.8 algorithm. The tagger inputs are

set to the optimum value for pT = 1.51.6 TeV, and the performance is normalized to the performance using the optimized tagger inputs at each pT

the performance of each variable degrades at small sig compared to the optimized search, the HEPTopTagger fares the worst. It is not surprising that a tagger whose top mass reconstruction is susceptible to background-shaping at large R and pT would require a more careful optimization of parameters to obtain the best performance; recent updates to the tagger algorithm [63,64] may mitigate the need for this more careful optimization.

The same holds true for the BDT combinations of the full tagger outputs, shown in Fig. 49. The performance for the sub-optimal taggers is still within an O(1) factor of the optimized performance, and the HEPTopTagger performs better with the combination of all of its outputs relative to the performance with just mt. The same behaviour holds for the

BDT combinations of tagger outputs and shape variables.

Optimizing at a single efciency The strongest assumption we have made so far is that the taggers can be re-optimized for each signal efciency point. This is useful for making a direct comparison of the power of different top-tagging algorithms, but is not particularly practical for LHC analyses. We now

consider the scenario in which the tagger inputs are optimized once, in the sig = 0.30.35 bin, and then used for all signal

efciencies. We do this in the pT = 1.01.1TeV bin and with R = 0.8.

The performance of each tagger, normalized to its performance optimized in each signal efciency bin, is shown in Fig. 50 for cuts on the top mass and W mass, and in Fig. 51 for BDT combinations of tagger outputs and shape variables. In both plots, it is apparent that optimizing the taggers in the sig = 0.30.35 efciency bin gives compa

rable performance over efciencies ranging from 0.2 to 0.5, although performance degrades at substantially different signal efciencies. Pruning appears to give especially robust signal-background discrimination without re-optimization, most likely due to the fact that there are no absolute distance or pT scales that appear in the algorithm. Figures 50 and 51 suggest that, while optimization at all signal efciencies is a useful tool for comparing different algorithms, it is not crucial to achieve good top-tagging performance in experiments.

123

Eur. Phys. J. C (2015) 75 :409 Page 47 of 52 409

BOOST13WG

bkd, optimized

bkg

2.4

2.2

> 1.0 TeV

bkd, optimized

bkg

2.2

> 1.0 TeV

1.8

1.6

1.4

1.2

> 0.6 TeV

1.2

> 0.6 TeV

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a) (b)

BOOST13WG

bkd, optimized

bkg

2.4

> 1.0 TeV

bkd, optimized

bkg

2.2

> 1.0 TeV

1.8

1.6

1.4

1.2

> 0.6 TeV

1.2

> 0.6 TeV

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

Fig. 47 Comparison of tagger performance at different pT using the anti-kT R = 0.8 algorithm. For each tagger/groomer, all output vari

ables are combined in a BDT, and the tagger inputs are set to the opti-

mum value for pT = 1.51.6 TeV. The performance is normalized to the performance using the optimized tagger inputs at each pT

7.5 Conclusions

We have studied the performance of various jet substructure variables, groomed masses, and top taggers to study the performance of top tagging with different pT and jet radius parameters. At each pT , R, and signal efciency working point, we optimize the parameters for those variables with tuneable inputs. Overall, we have found that these techniques, individually and in combination, continue to perform well at high pT , at least at the particle-level, which is important for future LHC running. In general, the John Hopkins tagger performs best, while jet grooming algorithms under-perform relative to the best top taggers due to the lack of an optimized W-identication step. Tagger performance can be improved by a further factor of 24 through combination with jet substructure variables such as 32, C3, and Qjet.

When combined with jet substructure variables, the performance of various groomers and taggers becomes very comparable, suggesting that, taken together, the variables studied

are sensitive to nearly all of the physical differences between top and QCD jets at particle-level. A small improvement is also found by combining the Johns Hopkins and HEPTopTaggers, indicating that different taggers are not fully correlated. The degree to which these ndings continue to hold under more realistic pile-up and detector congurations is, however, not addressed in this analysis and left to future study.

Comparing results at different pT and R, top-tagging performance is generally better at smaller R due to less contamination from uncorrelated radiation. Similarly, most variables perform better at larger pT due to the higher degree of collimation of radiation. Some variables fare worse at higher pT , such as the N-subjettiness ratio 32 and the Qjet mass volatility Qjet, as higher-pT QCD jets have more and harder emissions that fake the top-jet substructure. The standard HEPTopTagger algorithm is also worse at high pT due to the tendency of the tagger to shape backgrounds around the top mass. This is unsurprising, given that the HepTopTagger

123

409 Page 48 of 52 Eur. Phys. J. C (2015) 75 :409

BOOST13WG

bkd, optimized

bkg

bkd, optimized

bkg

2.4

2.2

R = 0.8

R = 0.4

R = 0.8

R = 0.4

1.8

1.6

1.4

1.2

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a) (b)

BOOST13WG

bkd, optimized

bkg

bkd, optimized

bkg

2.4

2.2

R = 0.8

R = 0.4

R = 0.8

R = 0.4

1.8

1.6

1.4

1.2

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

Fig. 48 Comparison of the top mass performance of different taggers at different R in the pT = 1.51.6TeV bin. The tagger inputs are set to the optimum value for R = 1.2, and the performance is normalized to the performance using the optimized tagger inputs at each R

was specically designed for a lower pT range than that considered here; recently proposed updates may improve performance at high pT and R [63,64]. The pT - and R-dependence of the multivariable combinations is dominated by the pT -and R-dependence of the top mass reconstruction component of the tagger/groomer.

Finally, we consider the performance of various tagger and jet substructure variable combinations under the more realistic assumption that the input parameters are only optimized at a single pT , R, or signal efciency, and then the same inputs are used at other working points. Remarkably, the performance of all variables is typically within a factor of 2 of the fully optimized inputs, suggesting that while optimization can lead to substantial gains in performance, the general behavior found in the fully optimized analyses extends to more general applications of each variable. In particular, the performance of pruning typically varies the least when comparing sub-optimal working points to the fully optimized tagger due to the scale-invariant nature of the pruning algorithm.

8 Summary and conclusions

Furthering our understanding of jet substructure is crucial to enhancing the prospects for the discovery of new physical processes at Run II of the LHC. In this report we have studied the performance of jet substructure techniques over a wide range of kinematic regimes that will be encountered in Run II of the LHC. The performance of observables and their correlations have been studied by combining the variables into Boosted Decision Tree (BDT) discriminants, and comparing the background rejection power of this discriminant to the rejection power achieved by the individual variables. The performance of all variables BDT discriminants has also been investigated, to understand the potential of the ultimate tagger where all available particle-level information (at least, all of that provided by the variables considered) is used.

We focused on the discrimination of quark jets from gluon jets, and the discrimination of boosted W bosons and top quarks from the QCD backgrounds. For each, we have identied the best-performing jet substructure observables at par-

123

Eur. Phys. J. C (2015) 75 :409 Page 49 of 52 409

BOOST13WG

bkd, optimized

bkg

bkd, optimized

bkg

2.4

2.2

R = 0.8

R = 0.4

R = 0.8

R = 0.4

1.8

1.6

1.4

1.2

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a) (b)

BOOST13WG

bkd, optimized

bkg

bkd, optimized

bkg

2.4

2.2

R = 0.8

R = 0.4

R = 0.8

R = 0.4

1.8

1.6

1.4

1.2

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

Fig. 49 Comparison of tagger performance at different R in pT =1.51.6TeV bin. For each tagger/groomer, all output variables are combined in a BDT, and the tagger inputs are set to the optimum value for

R = 1.2, and the performance is normalized to the performance using

the optimized tagger inputs at each R

BOOST13WG

3.5

bkd, optimized

bkg

(JH)

2.5

(HEP)

(trim)

1.5

(prune)

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

Fig. 50 Comparison of top-tagging performance with mt in the pT =

1 1.1 GeV bin using the anti-kT , R = 0.8 algorithm. The inputs for

each tagger are optimized for the sig = 0.30.35 bin, and the per

formance is normalized to the performance using the optimized tagger inputs at each sig

ticle level, both individually and in combination with other observables. In doing so, we have also provided a physical picture of why certain sets of observables are (un)correlated. Additionally, we have investigated how the performance of jet substructure observables varies with R and pT , identifying observables that are particularly robust against or susceptible to these changes. In the case of q/g tagging, it seems that the ideal performance can be nearly achieved by combining the most powerful discriminant, the number of constituents of a jet, with just one other variable, C=11 (or =11). Many of the other variables considered are highly correlated and provide little additional discrimination. For both top and W tagging, the groomed mass is a very important discriminating variable, but one that can be substantially improved in combination with other variables. There is clearly a rich and complex relationship between the variables considered for W and top tagging, and the performance and correlations between these variables can change consid-

123

409 Page 50 of 52 Eur. Phys. J. C (2015) 75 :409

3.5

bkd, optimized

bkg

bkd, optimized

bkg

+ m

(prune)

HEP

JH + HEP

HEP +

HEP + C

HEP +

HEP + shape

+ m

(trim)

2.5

1.5

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(a)

(b)

3.5

bkd, optimized

bkg

bkd, optimized

bkg

HEP

JH + HEP

JH + shape

HEP + shape

JH +

JH + C

JH +

JH + shape

2.5

1.5

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(c)

(d)

3.5

bkd, optimized

bkg

bkd, optimized

bkg

+ m

(prune)

+ m

(trim)

+ m

(prune)

+ m

(trim)

2.5

+ m

+ C (prune)

+ m

+ C (trim)

1.5

+ m

(prune)

1.5

+ m

(trim)

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

+ m

+ shape (prune)

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

+ m

+ shape (trim)

sig

(e)

(f)

3.5

bkd, optimized

bkg

bkd, optimized

bkg

+ m

(prune)

JH + shape

HEP + shape

+ shape (prune)

2.5

+ m

(trim)

2.5

+ m

+ shape (prune)

+ m

1.5

+ m

+ shape (trim)

+ m

+ shape (trim)

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sig

(g)

(h)

Fig. 51 The BDT combinations in the pT = 11.1 TeV bin using

the anti-kT R = 0.8 algorithm. Taggers are combined with the fol

lowing shape variables: =121 + =132, C=12 + C=13, Qjet, and all of

the above (denoted shape). The inputs for each tagger are optimized for the sig = 0.30.35 bin, and the performance is normalized to the

performance using the optimized tagger inputs at each sig

123

Eur. Phys. J. C (2015) 75 :409 Page 51 of 52 409

erably with changing jet pT and R. In the case of W tagging, even after combining groomed mass with two other substructure observables, we are still some way short of the ultimate tagger performance, indicating the complexity of the information available, and the complementarity between the observables considered. In the case of top tagging, we have shown that the performance of both the John Hopkins and HEPTopTagger can be improved when their outputs are combined with substructure observables such as 32 and C3, and that the performance of a discriminant built from groomed mass information plus substructure observables is very comparable to the performance of the taggers.We have optimized the top taggers for particular values of pT , R, and signal efciency, and studied their performance at other working points. We have found that the performance of observables remains within at most a factor of two of the optimized value, suggesting that the performance of jet substructure observables is not signicantly degraded when tagger parameters are only optimized for a few select benchmark points.

In all of q/g, W and top tagging, we have observed that the tagging performance improves with increasing pT . However, whereas for q/g and top tagging the performance improves with decreasing R (for the range of R considered here), the dependence on R for W tagging is more complex, with a peak performance at R = 0.8 for each pT bin considered.

Our analyses were performed with ideal detector and pile-up conditions in order to most clearly elucidate the underlying physical scaling with pT and R. At higher boosts, detector resolution effects will become more important, and with the higher pile-up expected at Run II of the LHC, pile-up mitigation will be crucial for future jet substructure studies. Future studies will be needed to determine which of the observables we have studied are most robust against pile-up and detector effects, and our analyses suggest particularly useful combinations of observables to consider in such studies.

At the new energy frontier of Run II of the LHC, boosted jet substructure techniques will be more central to our searches for new physics than ever before. By achieving a deeper understanding of the underlying structure of quark, gluon, W and top-initiated jets, as well as the relations between observables sensitive to their respective structures, it is hoped that more sophisticated analyses can be performed that will maximally extend the reach for new physics.

Acknowledgments We thank the Department of Physics at the University of Arizona for hosting and providing support for the BOOST 2013 workshop, and the US Department of Energy for their support of the workshop. We especially thank Vivian Knight (University of Arizona) for her help with the organization of the of the workshop. We also thank Prof. J. Boelts of the University of Arizona School of Art VisCom program and his Fall 2012 ART 465 class for organizing the design competition for the workshop poster. In particular, we thank the

winner of the competition, Ms. Hallie Bolonkin, for creating the nal design.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/

Web End =http://creativecomm http://creativecommons.org/licenses/by/4.0/

Web End =ons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Funded by SCOAP3.

References

1. Boost, SLAC National Accelerator Laboratory, 910 July 2009 (2009). http://www-conf.slac.stanford.edu/Boost2009

Web End =http://www-conf.slac.stanford.edu/Boost2009

2. Boost, University of Oxford, 2225 June 2010 (2010). http://www.physics.ox.ac.uk/boost2010

Web End =http://www. http://www.physics.ox.ac.uk/boost2010

Web End =physics.ox.ac.uk/boost2010

3. Boost, Princeton University, 2226 May 2011 (2011). https://indico.cern.ch/event/138809/

Web End =https:// https://indico.cern.ch/event/138809/

Web End =indico.cern.ch/event/138809/

4. Boost, IFIC Valencia, 2327 July 2012 (2012). http://ific.uv.es/boost2012

Web End =http://ic.uv.es/ http://ific.uv.es/boost2012

Web End =boost2012

5. Boost, University of Arizona, 1216 August 2013 (2013). https://indico.cern.ch/event/215704/

Web End =https:// https://indico.cern.ch/event/215704/

Web End =indico.cern.ch/event/215704/

6. Boost, University College London, 1822 August 2014 (2014). http://www.hep.ucl.ac.uk/boost2014/

Web End =http://www.hep.ucl.ac.uk/boost2014/

7. A. Abdesselam, E.B. Kuutmann, U. Bitenc, G. Brooijmans, J. Butterworth, et al., Eur. Phys. J. C 71, 1661 (2011). doi:http://dx.doi.org/10.1140/epjc/s10052-011-1661-y

Web End =10.1140/epjc/ http://dx.doi.org/10.1140/epjc/s10052-011-1661-y

Web End =s10052-011-1661-y

8. A. Altheimer, S. Arora, L. Asquith, G. Brooijmans, J. Butterworth, et al., J. Phys. G 39, 063001 (2012). doi:http://dx.doi.org/10.1088/0954-3899/39/6/063001

Web End =10.1088/0954-3899/39/ http://dx.doi.org/10.1088/0954-3899/39/6/063001

Web End =6/063001

9. A. Altheimer, A. Arce, L. Asquith, J. Backus Mayes, E. Bergeaas Kuutmann, et al., Eur. Phys. J. C 74(3), 2792 (2014). doi:http://dx.doi.org/10.1140/epjc/s10052-014-2792-8

Web End =10.1140/ http://dx.doi.org/10.1140/epjc/s10052-014-2792-8

Web End =epjc/s10052-014-2792-8

10. T. Plehn, M. Spannowsky, M. Takeuchi, D. Zerwas, JHEP 1010, 078 (2010). doi:http://dx.doi.org/10.1007/JHEP10(2010)078

Web End =10.1007/JHEP10(2010)078

11. D.E. Kaplan, K. Rehermann, M.D. Schwartz, B. Tweedie, Phys. Rev. Lett. 101, 142001 (2008). doi:http://dx.doi.org/10.1103/PhysRevLett.101.142001

Web End =10.1103/PhysRevLett.101. http://dx.doi.org/10.1103/PhysRevLett.101.142001

Web End =142001

12. J. Alwall, M. Herquet, F. Maltoni, O. Mattelaer, T. Stelzer, JHEP 1106, 128 (2011). doi:http://dx.doi.org/10.1007/JHEP06(2011)128

Web End =10.1007/JHEP06(2011)128

13. Y. Gao, A.V. Gritsan, Z. Guo, K. Melnikov, M. Schulze, et al., Phys. Rev. D 81, 075022 (2010). doi:http://dx.doi.org/10.1103/PhysRevD.81.075022

Web End =10.1103/PhysRevD.81.075022 14. S. Bolognesi, Y. Gao, A.V. Gritsan, K. Melnikov, M. Schulze, et al., Phys. Rev. D 86, 095031 (2012). doi:http://dx.doi.org/10.1103/PhysRevD.86.095031

Web End =10.1103/PhysRevD. http://dx.doi.org/10.1103/PhysRevD.86.095031

Web End =86.095031 15. I. Anderson, S. Bolognesi, F. Caola, Y. Gao, A.V. Gritsan, et al., Phys. Rev. D 89, 035007 (2014). doi:http://dx.doi.org/10.1103/PhysRevD.89.035007

Web End =10.1103/PhysRevD.89. http://dx.doi.org/10.1103/PhysRevD.89.035007

Web End =035007

16. J. Pumplin, D. Stump, J. Huston, H. Lai, P.M. Nadolsky, et al., JHEP 0207, 012 (2002). doi:http://dx.doi.org/10.1088/1126-6708/2002/07/012

Web End =10.1088/1126-6708/2002/07/012

17. T. Sjostrand, S. Mrenna, P.Z. Skands, Comput. Phys. Commun. 178, 852 (2008). doi:http://dx.doi.org/10.1016/j.cpc.2008.01.036

Web End =10.1016/j.cpc.2008.01.036

18. A. Buckley, J. Butterworth, S. Gieseke, D. Grellscheid, S. Hoche, et al., Phys. Rept. 504, 145 (2011). doi:http://dx.doi.org/10.1016/j.physrep.2011.03.005

Web End =10.1016/j.physrep.2011.03. http://dx.doi.org/10.1016/j.physrep.2011.03.005

Web End =005

19. T. Gleisberg, S. Hoeche, F. Krauss, M. Schonherr, S. Schumann, et al., JHEP 0902, 007 (2009). doi:http://dx.doi.org/10.1088/1126-6708/2009/02/007

Web End =10.1088/1126-6708/2009/02/ http://dx.doi.org/10.1088/1126-6708/2009/02/007

Web End =007

20. S. Schumann, F. Krauss, JHEP 0803, 038 (2008). doi:http://dx.doi.org/10.1088/1126-6708/2008/03/038

Web End =10.1088/ http://dx.doi.org/10.1088/1126-6708/2008/03/038

Web End =1126-6708/2008/03/038

21. F. Krauss, R. Kuhn, G. Soff, JHEP 0202, 044 (2002). doi:http://dx.doi.org/10.1088/1126-6708/2002/02/044

Web End =10.1088/ http://dx.doi.org/10.1088/1126-6708/2002/02/044

Web End =1126-6708/2002/02/044

123

409 Page 52 of 52 Eur. Phys. J. C (2015) 75 :409

22. T. Gleisberg, S. Hoeche, JHEP 0812, 039 (2008). doi:http://dx.doi.org/10.1088/1126-6708/2008/12/039

Web End =10.1088/ http://dx.doi.org/10.1088/1126-6708/2008/12/039

Web End =1126-6708/2008/12/039

23. S. Hoeche, F. Krauss, S. Schumann, F. Siegert, JHEP 0905, 053 (2009). doi:http://dx.doi.org/10.1088/1126-6708/2009/05/053

Web End =10.1088/1126-6708/2009/05/053

24. M. Schonherr, F. Krauss, JHEP 0812, 018 (2008). doi:http://dx.doi.org/10.1088/1126-6708/2008/12/018

Web End =10.1088/ http://dx.doi.org/10.1088/1126-6708/2008/12/018

Web End =1126-6708/2008/12/018

25. S. Bethke, et al., Phys. Lett. B 213, 235 (1988). doi:http://dx.doi.org/10.1016/0370-2693(88)91032-5

Web End =10.1016/ http://dx.doi.org/10.1016/0370-2693(88)91032-5

Web End =0370-2693(88)91032-5

26. M. Cacciari, G.P. Salam, G. Soyez, JHEP 0804, 063 (2008). doi:http://dx.doi.org/10.1088/1126-6708/2008/04/063

Web End =10. http://dx.doi.org/10.1088/1126-6708/2008/04/063

Web End =1088/1126-6708/2008/04/063

27. Y.L. Dokshitzer, G. Leder, S. Moretti, B. Webber, JHEP 9708, 001 (1997). doi:http://dx.doi.org/10.1088/1126-6708/1997/08/001

Web End =10.1088/1126-6708/1997/08/001

28. M. Wobisch, T. Wengler, in Proceedings of the Monte Carlo Generators for HERA Physics Workshop, Hamburg, ed. by A. Doyle (1998)

29. S. Catani, Y.L. Dokshitzer, M. Seymour, B. Webber, Nucl. Phys. B 406, 187 (1993). doi:http://dx.doi.org/10.1016/0550-3213(93)90166-M

Web End =10.1016/0550-3213(93)90166-M

30. S.D. Ellis, D.E. Soper, Phys. Rev. D 48, 3160 (1993). doi:http://dx.doi.org/10.1103/PhysRevD.48.3160

Web End =10.1103/

http://dx.doi.org/10.1103/PhysRevD.48.3160

Web End =PhysRevD.48.3160

31. S.D. Ellis, A. Hornig, T.S. Roy, D. Krohn, M.D. Schwartz, Phys.

Rev. Lett. 108, 182003 (2012). doi:http://dx.doi.org/10.1103/PhysRevLett.108.182003

Web End =10.1103/PhysRevLett.108. http://dx.doi.org/10.1103/PhysRevLett.108.182003

Web End =182003

32. S.D. Ellis, A. Hornig, D. Krohn, T.S. Roy, JHEP 1501, 022 (2015). doi:http://dx.doi.org/10.1007/JHEP01(2015)022

Web End =10.1007/JHEP01(2015)022

33. S.D. Ellis, C.K. Vermilion, J.R. Walsh, Phys. Rev. D 81, 094023 (2010). doi:http://dx.doi.org/10.1103/PhysRevD.81.094023

Web End =10.1103/PhysRevD.81.094023

34. D. Krohn, J. Thaler, L.T. Wang, JHEP, 084 (2010). doi:http://dx.doi.org/10.1007/JHEP02(2010)084

Web End =10.1007/

http://dx.doi.org/10.1007/JHEP02(2010)084

Web End =JHEP02(2010)084

35. J.M. Butterworth, A.R. Davison, M. Rubin, G.P. Salam, Phys. Rev.

Lett. 100, 242001 (2008). doi:http://dx.doi.org/10.1103/PhysRevLett.100.242001

Web End =10.1103/PhysRevLett.100.242001

36. A.J. Larkoski, S. Marzani, G. Soyez, J. Thaler, JHEP 1405, 146 (2014). doi:http://dx.doi.org/10.1007/JHEP05(2014)146

Web End =10.1007/JHEP05(2014)146

37. M. Dasgupta, A. Fregoso, S. Marzani, G.P. Salam, JHEP 1309, 029 (2013). doi:http://dx.doi.org/10.1007/JHEP09(2013)029

Web End =10.1007/JHEP09(2013)029

38. V. Khachatryan, et al., JHEP 1408, 173 (2014). doi:http://dx.doi.org/10.1007/JHEP08(2014)173

Web End =10.1007/

http://dx.doi.org/10.1007/JHEP08(2014)173

Web End =JHEP08(2014)173

39. G. Aad, et al., New J. Phys. 16(11), 113013 (2014). doi:http://dx.doi.org/10.1088/1367-2630/16/11/113013

Web End =10.1088/ http://dx.doi.org/10.1088/1367-2630/16/11/113013

Web End =1367-2630/16/11/113013

40. Performance of Boosted W Boson Identication with the ATLAS Detector. Tech. Rep. ATL-PHYS-PUB-2014-004, CERN, Geneva (2014)

41. J. Thaler, K. Van Tilburg, JHEP 1103, 015 (2011). doi:http://dx.doi.org/10.1007/JHEP03(2011)015

Web End =10.1007/

http://dx.doi.org/10.1007/JHEP03(2011)015

Web End =JHEP03(2011)015

42. A.J. Larkoski, D. Neill, J. Thaler, JHEP 1404, 017 (2014). doi:http://dx.doi.org/10.1007/JHEP04(2014)017

Web End =10. http://dx.doi.org/10.1007/JHEP04(2014)017

Web End =1007/JHEP04(2014)017

43. A.J. Larkoski, J. Thaler, JHEP 1309, 137 (2013). doi:http://dx.doi.org/10.1007/JHEP09(2013)137

Web End =10.1007/

http://dx.doi.org/10.1007/JHEP09(2013)137

Web End =JHEP09(2013)137

44. A.J. Larkoski, G.P. Salam, J. Thaler, JHEP 1306, 108 (2013). doi:http://dx.doi.org/10.1007/JHEP06(2013)108

Web End =10.1007/JHEP06(2013)108

45. S. Chatrchyan, et al., JHEP 1204, 036 (2012). doi:http://dx.doi.org/10.1007/JHEP04(2012)036

Web End =10.1007/ http://dx.doi.org/10.1007/JHEP04(2012)036

Web End =JHEP04(2012)036

46. A.J. Larkoski, J. Thaler, W.J. Waalewijn, JHEP 1411, 129 (2014). doi:http://dx.doi.org/10.1007/JHEP11(2014)129

Web End =10.1007/JHEP11(2014)129

47. A. Hoecker, P. Speckmayer, J. Stelzer, J. Therhaag, E. von Toerne,H. Voss, An example of the BDT settings used in these studies are as follows: NTrees=1000; BoostType=Grad; Shrinkage=0.1; Use-BaggedGrad=F; nCuts=10000; MaxDepth=3; UseYesNoLeaf=F; nEventsMin=200, PoS ACAT, 040 (2007)48. G. Aad, et al., Eur. Phys. J. C 74(8), 3023 (2014). doi:http://dx.doi.org/10.1140/epjc/s10052-014-3023-z

Web End =10.1140/ http://dx.doi.org/10.1140/epjc/s10052-014-3023-z

Web End =epjc/s10052-014-3023-z

49. J. Gallicchio, M.D. Schwartz, JHEP 1304, 090 (2013). doi:http://dx.doi.org/10.1007/JHEP04(2013)090

Web End =10. http://dx.doi.org/10.1007/JHEP04(2013)090

Web End =1007/JHEP04(2013)090

50. A.J. Larkoski, I. Moult, D. Neill, JHEP 1409, 046 (2014). doi:http://dx.doi.org/10.1007/JHEP09(2014)046

Web End =10. http://dx.doi.org/10.1007/JHEP09(2014)046

Web End =1007/JHEP09(2014)046

51. M. Procura, W.J. Waalewijn, L. Zeune, JHEP 1502, 117 (2015). doi:http://dx.doi.org/10.1007/JHEP02(2015)117

Web End =10.1007/JHEP02(2015)117

52. J. Gallicchio, M.D. Schwartz, Phys. Rev. Lett. 107, 172001 (2011). doi:http://dx.doi.org/10.1103/PhysRevLett.107.172001

Web End =10.1103/PhysRevLett.107.172001

53. C. Collaboration (2013)54. H.n. Li, Z. Li, C.P. Yuan, Phys. Rev. D 87, 074025 (2013). doi:http://dx.doi.org/10.1103/PhysRevD.87.074025

Web End =10. http://dx.doi.org/10.1103/PhysRevD.87.074025

Web End =1103/PhysRevD.87.074025

55. M. Dasgupta, K. Khelifa-Kerfa, S. Marzani, M. Spannowsky, JHEP 1210, 126 (2012). doi:http://dx.doi.org/10.1007/JHEP10(2012)126

Web End =10.1007/JHEP10(2012)126

56. M. Dasgupta, A. Fregoso, S. Marzani, A. Powling, Eur. Phys. J. C 73(11), 2623 (2013). doi:http://dx.doi.org/10.1140/epjc/s10052-013-2623-3

Web End =10.1140/epjc/s10052-013-2623-3

57. Y.T. Chien, R. Kelley, M.D. Schwartz, H.X. Zhu, Phys. Rev. D 87(1), 014010 (2013). doi:http://dx.doi.org/10.1103/PhysRevD.87.014010

Web End =10.1103/PhysRevD.87.014010

58. T.T. Jouttenus, I.W. Stewart, F.J. Tackmann, W.J. Waalewijn, Phys. Rev. D 88(5), 054031 (2013). doi:http://dx.doi.org/10.1103/PhysRevD.88.054031

Web End =10.1103/PhysRevD.88.054031

59. Z.L. Liu, C.S. Li, J. Wang, Y. Wang, JHEP 1504, 005 (2015). doi:http://dx.doi.org/10.1007/JHEP04(2015)005

Web End =10. http://dx.doi.org/10.1007/JHEP04(2015)005

Web End =1007/JHEP04(2015)005

60. S.D. Ellis, C.K. Vermilion, J.R. Walsh, Phys. Rev. D 80, 051501 (2009). doi:http://dx.doi.org/10.1103/PhysRevD.80.051501

Web End =10.1103/PhysRevD.80.051501

61. Y. Cui, Z. Han, M.D. Schwartz, Phys. Rev. D 83, 074023 (2011). doi:http://dx.doi.org/10.1103/PhysRevD.83.074023

Web End =10.1103/PhysRevD.83.074023

62. D.E. Soper, M. Spannowsky, JHEP 1008, 029 (2010). doi:http://dx.doi.org/10.1007/JHEP08(2010)029

Web End =10.1007/ http://dx.doi.org/10.1007/JHEP08(2010)029

Web End =JHEP08(2010)029

63. C. Anders, C. Bernaciak, G. Kasieczka, T. Plehn, T. Schell, Phys. Rev. D 89, 074047 (2014). doi:http://dx.doi.org/10.1103/PhysRevD.89.074047

Web End =10.1103/PhysRevD.89.074047 64. G. Kasieczka, T. Plehn, T. Schell, T. Strebler, G.P. Salam (2015), JHEP 1506, 203 (2015). doi:http://dx.doi.org/10.1007/JHEP06(2015)203

Web End =10.1007/JHEP06(2015)203

65. S. Schaetzel, M. Spannowsky, Phys. Rev. D 89(1), 014007 (2014). doi:http://dx.doi.org/10.1103/PhysRevD.89.014007

Web End =10.1103/PhysRevD.89.014007

123

Word count: 28512

Show less

SIF and Springer-Verlag Berlin Heidelberg 2015

Abstract

Translate

Over the past decade, a large number of jet substructure observables have been proposed in the literature, and explored at the LHC experiments. Such observables attempt to utilize the internal structure of jets in order to distinguish those initiated by quarks, gluons, or by boosted heavy objects, such as top quarks and W bosons. This report, originating from and motivated by the BOOST2013 workshop, presents original particle-level studies that aim to improve our understanding of the relationships between jet substructure observables, their complementarity, and their dependence on the underlying jet properties, particularly the jet radius and jet transverse momentum. This is explored in the context of quark/gluon discrimination, boosted W boson tagging and boosted top quark tagging.

Details

Title

Towards an understanding of the correlations in jet substructure

Author

Adams, D; Arce, A; Asquith, L; Backovic, M; Barillari, T; Berta, P; Bertolini, D; Buckley, A; Butterworth, J; Camacho Toro, R C; Caudron, J; Chien, Y-t; Cogan, J; Cooper, B; Curtin, D; Debenedetti, C; Dolen, J; Eklund, M; El Hedri, S; Ellis, S D; Embry, T; Ferencek, D; Ferrando, J; Fleischmann, S; Freytsis, M; Giulini, M; Han, Z; Hare, D; Harris, P; Hinzmann, A; Hoing, R; Hornig, A; Jankowiak, M; Johns, K; Kasieczka, G; Kogler, R; Lampl, W; Larkoski, A J; Lee, C; Leone, R; Loch, P; Lopez Mateos, D; Lou, H K; Low, M; Maksimovic, P; Marchesini, I; Marzani, S; Masetti, L; Mccarthy, R; Menke, S; Miller, D W; Mishra, K; Nachman, B; Nef, P; O'grady, F T; Ovcharova, A; Picazio, A; Pollard, C; Potter-landua, B; Potter, C; Rappoccio, S; Rojo, J; Rutherfoord, J; Salam, G P; Schabinger, R M; Schwartzman, A; Schwartz, M D; Shuve, B; Sinervo, P; Soper, D; Sosa Corral, D E; Spannowsky, M; Strauss, E; Swiatlowski, M; Thaler, J; Thomas, C; Thompson, E; Tran, N V; Tseng, J; Usai, E; Valery, L; Veatch, J; Vos, M; Waalewijn, W; Wacker, J; Young, C

Pages

1-52

Publication year

2015

Publication date

Sep 2015

Publisher

Springer Nature B.V.

ISSN

14346044

e-ISSN

14346052

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1140/epjc/s10052-015-3587-2

ProQuest document ID

1710610649

SIF and Springer-Verlag Berlin Heidelberg 2015

Towards an understanding of the correlations in jet substructure: Report of BOOST2013, hosted by the University of Arizona, 12th-16th of August 2013

Jump to:

Full Text

Abstract

Details

Suggested sources