MMHT PDFs: updates and outlook

We present the latest results of studies within the MMHT PDF framework. We discuss the impact of the most recent ATLAS 7 TeV jet data, demonstrating that while a good fit can be achieved for individual jet rapidity bins, it is not possible to achieve a good description of the data when all bins are fitted. We examine the role that the experimental correlated systematic uncertainties play in this, and demonstrate that by simply decorrelating no more than two sources of error between rapidity bins a remarkably improved description of the data can be achieved. We then study the impact of NNLO corrections, showing that a mild decrease in the fit quality is produced. We also present the results of including new LHC $W$, $Z$, $W+c$ and $t\overline{t}$ data on the MMHT14 PDF set, showing that a marked decrease in the $s+\overline{s}$ uncertainty is in particular achieved. Finally, some discussion of the latest work towards the inclusion of the photon PDF within the MMHT framework is presented.


Introduction
The MMHT14 parton distribution function (PDF) set [1] is the successor to the MSTW08 [2] PDFs.It combines a range of theoretical updates with new data, including for the first time measurements from the LHC.Subsequent studies on the α s determination [3] and heavy quark mass dependence [4] were performed.More recently, the impact of the final combined HERA I+II data set [5] was examined [6].Here, the MMHT14 PDFs were found to give a good description of the HERA data, with the central values and uncertainties being changed relatively little by their inclusion in the fit.It was therefore decided not to release a new set at this point, but rather to wait until theoretical developments such as the full NNLO calculation of the jet production cross section were complete, and a more precise and varied range of LHC data became available.Recently there has also been great progress in the determination of the photon PDF [7,8,9], with in particular the study of [9] demonstrating that this object can be determined with percent-level precision.However, these findings have yet to be included in the context of a global PDF fit.
In these proceedings we report on work in all of the directions described above.Namely, we discuss a first look at the impact of LHC jets at NNLO, as well new LHC data on W , Z, W + c and tt production on the PDFs, and present the latest work towards including a precisely determined photon PDF within the MMHT framework.

The impact of LHC jet data
Jet collider data play an important role in constraining the gluon PDF at higher x and indeed in the past have placed the only reasonable direct constraint in this region (LHC measurements such as tt production, Z and W boson p ⊥ distributions and isolated photon production will also play an important role in the future).However a full NNLO calculation of jet production has until recently not been available.For this reason, in the MMHT14 set Tevatron jet data were included in the NNLO fit including the approximate threshold corrections of [10], with the argument made that the difference between this and the full NNLO result should be under control, and in particular smaller than the experimental systematic uncertainties.Such a conclusion does not however follow in general at the LHC, where the larger √ s implies that much of the data lie very far from threshold, while those data that do not in fact probe a kinematically very similar region to the existing Tevatron data.For this reason LHC jet data were omitted from the fit, although a number of exploratory studies with different toy models for the NNLO K-factors were performed in [1], and the impact on the gluon PDF was found to be relatively minor.However, in [11] the first calculation of fully differential jet production at NNLO was presented, allowing LHC jet data to be correctly included in a NNLO PDF fit for the first time.In this study NNLO K-factors are presented corresponding to the ATLAS 7 TeV measurement [12], with jet radius R = 0.4, and therefore in these proceedings we consider only this data set.

NLO comparison and decorrelation study
We begin by considering the prediction and fit at NLO, before including the NNLO corrections.As the baseline PDF we use the MMHT14 set including the HERA I+II combined data [5], that is as presented in [6].The predicted and fit data/theory for the 0.5 < |y j | < 1.0 and 1.0 < |y j | < 1.5 jet rapidity bins are shown in Fig. 1, with the shifts due to the correlated systematic uncertainties included.The description of the data is visibly poor, and does not improve greatly with refitting.In particular, the χ 2 for the description is 413, decreasing to 400 after refitting, for 140 data points.From Fig. 1 we can see that a significant contributing factor to this is an essentially systematic offset in the data/theory between the two neighbouring rapidity bins, but in opposite directions.As these probe PDF sets of the same flavour in very similar x and Q 2 regions little improvement is possible (or observed) by refitting to this data.eously fit data in all bins.Mismatch in one rapidity rm to neighbouring bins but these are sensitive to vour at very similar and .

PDF4LHC -CERN -September 2016
Cannot simultaneously fit data in all bins.Mism bin different in form to neighbouring bins but the PDFs of same flavour at very similar and .
Fig. 1.Comparison of NLO prediction and fit to ATLAS jet data [12] for two jet rapidity bins.Data/theory is plotted, with the data already shifted by the systematic uncertainties in order to achieve the best description.The displayed errors are purely statistical.
The cause of this appears to lie with the shift allowed by the correlated systematic uncertainties.The ATLAS data contain a large number of individual correlated errors which are generally completely dominant over the (small) statistical errors; for the 'weaker' assumption about error correlations defined in [12] that we take, there are 71 individual sources of systematic error.If we simply assume that all of these uncertainties are completely decorrelated between the six rapidity bins (while remaining fully correlated within the bins) a universally good description is found: in this case, the extra freedom allows the data to shift in order to achieve a reasonable data/theory description.This is however clearly a hugely over-conservative assumption.To be more precise, we examine the size of the shifts r k for each source of systematic uncertainty by which the theory (or equivalently, data) points are allowed to move, as defined in the χ 2 where D i is ith data point, T i is the theory prediction and σ uncorr i (σ corr k,i ) are the uncorrelated (correlated) errors.We in particular evaluate the shifts for each of the first four rapidity bins (from 0 to 2.0 in steps of 0.5) individually; including the last two rapidity bins, where the data tend to be less precise, does not affect the conclusions that follow.Any tensions between the different bins may then show up through significantly different r k values being preferred in the different rapidity bins, in order to achieve good individual fits.In Fig. 2 we show the average squared sum of the shift differences (r i − r j ) 2 for the four bins.It is clear that for a small subset of the shifts the size of this difference is significantly larger than zero, indicating a large degree of tension.
Error Number The three shifts jes21, 45 and 62 as defined in [13], which correspond [14] to the multi-jet balance asymmetry, an in-situ statistical uncertainty and the jet energy scale close by jets, respectively, show particularly large differences.We therefore investigate the impact of decorrelating these systematic uncertainties alone between rapidity bins.The result for a selection of these three systematic uncertainties, as well as combinations of them, is shown in Table 1, and is found to be dramatic.Simply decorrelating jes21, for example, leads to a reduction of 180 points in χ 2 , giving almost a factor of 2 decrease in the χ 2 /N pts.from 2.85 to 1.58; the result for the other two uncertainties is also significant, although not as large.Decorrelating jes62 in addition gives a χ 2 /N pts. of 1.27.The same data/theory comparisons as in Fig. 1, but including this decorrelation of jes21 and jes62, are shown in Fig. 3 and are visibly improved, with the additional freedom allowing the data/theory to shift in the different rapidity bins and achieve a good overall description.The correlation between systematic errors should clearly be determined by physics considerations and not simply the possibility of improving the theory description of the data1 .Nonetheless this is an interesting finding which may hopefully guide a future re-analysis of the ATLAS systematic uncertainties, given the apparent tensions that are present.For current purposes it will be useful to use this choice of error decorrelation when considering the impact of NNLO corrections in the following section.[12], with the default systematic error treatment ('full') and with certain errors, defined in the text, decorrelated between jet rapidity bins.
• Fit to data improves dramatically -little sign of systematic offset.

NNLO comparison
We now consider the impact of including the NNLO corrections calculated in [11] on the data description.The results are shown in Table 2 for the default treatment of the correlated systematic errors and with the error decorrelation defined in the preceding section.In both cases there is a Full corr.jes21,62 decorr.χ 2 , NLO (413) 400 (180) 178 χ 2 , NNLO (443) 427 (211) 204 Table 2. χ 2 for (description)fit to ATLAS jets data [12], with the default systematic error treatment and with errors jes21,62, defined in the text, decorellated between jet rapidity bins.
significant, although not dramatic, deterioration in the χ 2 both with and without the ATLAS data included in the fit.The source of this effect can be seen most clearly if we consider the data/theory comparison prior to including the shifts due to the correlated systematic errors.This is shown in Fig. 4, and there is clearly a trend for the NNLO corrections to shift the theory away from the data at lower jet p ⊥ ; such an effect is also visible in the results of [11].While the final χ 2 will depend on the precise way in which the systematic uncertainties allow the data/theory to shift this will in general lead to some deterioration in the fit quality.This is indeed evident in Table 2, and moreover is seen to be roughly independent of the precise treatment of the systematic error correlation.The effect of including the ATLAS data on the gluon PDF is shown in Fig. 5, and is seen to lead to a somewhat softer gluon at higher x, lying on the edge of the PDF uncertainty band.The impact is qualitatively similar, although a little milder, for the decorrelated error treatment.A similar, though smaller, effect is seen at NLO, not displayed here.• Clear trend for NNLO theory to be too high at low .However here systematics completely dominant difficult to make firm statement, given overall issues with fit.[12] for two jet rapidity bins.Data/theory is plotted, without including the shifts due to the systematic uncertainties.Errors are the systematic and statistical added in quadrature.
In the future, it will be important to consider a wider range of jet radii, Effect on gluon at NNLO • How does gluon look in this NNLO fit?
• Full correlations -qualitatively similar, but stronger effect.
• But, given fit issues, not to be taken too literally (particularly for full corr.) x x 20 Fig. 5. Impact on the gluon PDF of the ATLAS jet data, for the default and systematic error treatment and with errors jes21,62, defined in the text.
R, where available.For example, the ATLAS data are presented for a value of R = 0.6 as well as the case of R = 0.4 considered here, while the CMS measurement [15] of inclusive jet production at 7 TeV takes R = 0.7.At the time of these proceedings NNLO K-factors corresponding to these data were not publicly available, however there is some evidence that a large value of R leads to more stable perturbative results [16].In addition, it is worth pointing out that the NLO description of these CMS data, which is included in the MMHT14 fit, is very good, being close to 1 per point [1].For these reasons it will therefore be very informative to consider the impact of NNLO corrections on the comparison to the CMS data.A final important factor is the choice of factorization and renormalization scales.For the comparison to the ATLAS data this is given according to the p ⊥ of the leading jet in each event.However an alternative choice is to simply treat the data inclusively, taking the p ⊥ of each jet in the event as a choice of scale.This is observed to have quite a large impact on the overall result [16].Furthermore, the NLO comparison to the CMS jet data takes such a choice.A complete investigation of all of the above factors will therefore be essential before a full assessment of the impact of NNLO corrections on the comparison to jet data can be made.

Inclusion of new LHC data
In addition to the final combined HERA I+II data set [5], much new LHC and Tevatron data have become available since the release of the MMHT14 PDF set.We have included a range of these in what we label 'MMHT (2016 fit)', an unofficial set that will not be made publicly available but allows the impact of this data on the PDFs to be judged, and paves the way to the public release of a new set.Included in this are the latest tt total cross section data, LHCb data [17,18,19]   Z boson production, CMS data on W boson production [20] and W boson production in association with a charm quark [21], and an updated D0 measurement of the W → eν asymmetry [22].In addition, a comparison and fit to the CMS double differential Drell-Yan measurement at 8 TeV [23] is attempted, however there are some issues in the comparison that we are currently attempting to resolve.All cross sections are calculated at NLO using MCFM [24] in combination with Applgrid [25], with NNLO K-factors calculated using top++ [26] for the tt case and FEWZ [27] for the W and Z case.For W + c production the NNLO calculation is not currently available, so we simply use the NLO calculation in the NNLO fit, as the size of these corrections is expected to be smaller than the experimental uncertainties in the data we compare to.The quality of the data description with and without the new data included in the fit at NLO and NNLO is shown in Table 3 2 .The description is generally observed to be good, with some mild improvement after refitting.The one exception to this is the CMS W boson production data [20], where a considerable improvement with refitting is observed.In addition the data description is seen to be somewhat better at NNLO compared to NLO.The best-fit strong coupling α S (M 2 Z ) is found to increase to about 0.118 from 0.1172 at NNLO, while at NLO it remains stable at 0.12.The comparison to the CMS W boson and W + c production data are shown in Fig. 6, and the improvement in the description with refitting for the former case is clear.As in the case of MMHT14 the 'MMHT (2016 fit)' error set has 25 eigenvectors, corresponding to 50 free directions.We find that 14 of these directions are constrained, according to the dynamical tolerance technique described in [2], by the new LHC data.Results for the two most affected PDF combinations, the strange sum s + s and valence quark difference u V − d V are shown in Fig. 7.In the former case a significant reduction in the uncertainty is observed, with a mild increase in the central value, due in large part to the CMS W + c data, which is strongly sensitive to this PDF combination.The shape of the valence quark difference changes quite dramatically, with a reduction in the uncertainty at the percent level seen at low and intermediate x; in fact, closer inspection reveals that the dominant change is in fact in the up quark valence distribution.This is mainly driven by the CMS W boson production data, which is sensitive to this quark flavour combination, with some impact coming from the combine HERA data as well.There are some smaller changes in the light sea and gluon PDF, largely driven by the new HERA combined data, which we do not show here for the sake of brevity.

Towards MMHTQED
While PDFs are more commonly associated with the quarks and gluons within the proton, it is also possible for photon-initiated processes to occur in proton collisions, with a corresponding photon PDF introduced.This is becoming increasingly relevant at the LHC, where NNLO QCD precision is now the standard for a large number of processes.Indeed, as roughly speaking α(M 2 Z ) ∼ α 2 S (M 2 Z ), if we are to quote NNLO QCD accuracy it is crucial to consider the possible contribution from NLO electroweak corrections; photon-initiated processes are one irreducible part of these.Earlier efforts to describe the photon PDF fall into two categories, being either model-dependent attempts based on a simple ansatz due to quark radiation of photons, as in the MRST2004QED [29] and more recent CT14QED [30] sets, or the agnostic treatment of the NNPDFQED set [31], which freely parameterises the photon in the same way as the quark and gluons, with constraints from DIS and LHC W and Z data included.In the latter case this leads to significant uncertainties on the photon, due to the relatively small impact photon-initiated contributions have on such data and hence their limited constraining power.One particular issue that has arisen from the use of this set is the appearance of very large uncertainties in photoninitiated cross sections at high mass, with a central value that can be larger than conventional channels.Such an effect has for example been discussed in the case of Drell-Yan production in [32,33,34], W W production in [34] and tt production in [35].
More recently, there has been great progress in the determination of the photon PDF, based on the crucial observations that the dominant contribution to the photon is from the well understood elastic p → pγ emission process (see [8] for discussion) and that more generally the photon can be related directly to the proton structure functions probed in ep scattering, which contain both elastic and inelastic contributions, the latter leading into the DIS region as Q 2 is increased.This connection is made precise in [9], where it is shown that their 'LUXqed' photon PDF is generally known with percent level precision in terms of the available structure function data.In particular, they show that the photon can be written as where F 2,L are the usual proton structure functions, and p γq (z) is the LO γq splitting function.While precise, this form relies upon the approximation that the quarks and gluons are independent of the photon, i.e. omitting the impact of the γ → qq splitting on the quarks and gluons themselves.While this approximation is generally a good one, with corrections being higher order in α, it leads for example to some violation of the momentum sum rule due to the asymmetry in the treatment of the quark/gluons and the photon.In [9] this is corrected for by absorbing all momentum violation into the gluon PDF, but more generally a full treatment of the coupled DGLAP evolution between the photons and QCD partons, with the input photon PDF at a scale Q 0 determined using the same physics input as LUXqed may be preferable.-MMHTqed -NNPDF -LUXqed • As expected, close consistency between MMHTqed and LUXqed (similar inputs).
• Bottom line: we have moved beyond era of large photon PDF uncertainties.No room for dominant photon-initiated contributions a high .
Work towards including the photon PDF within the MMHT framework is ongoing.In particular, we separate the Q 2 integral in (2) into a Q 2 < Q 2 0 = 1 GeV 2 region, which determines the input photon γ(x, Q 2 0 ), and a Q 2 > Q 2 0 region where a suitably modified form of the fully coupled DGLAP evolution is performed within the MMHT framework (work to include the O(αα S ) corrections to the evolution is currently being finalised).This will allow the photon to be included simply and consistently with future PDF updates.An additional advantage is that the photon PDF of the neutron can be included, with a suitable model of isospin violation at the input scale applied.Finally, it would be possible in principle to include uncertainties on the input due, for example, to the structure functions entering into (2), and allow for the impact of LHC data on for example high mass Drell-Yan production to be assessed (see [36] for a recent study).The constraining power of such data is unlikely to be competitive, but this will provide a good consistency check.
A first result for the γγ luminosity at 13 TeV is shown in Fig. 8, with the LUXqed set included for comparison.Broadly speaking very close agreement is seen between the two sets, as expected given the input in the Q 2 < Q 2 0 is essentially identical; a more precise comparison is ongoing.Also shown is the NNPDF3.0prediction, including the corresponding 68% confidence level uncertainties.These are seen to be very large at higher mass, with the central value being quite high, consistent with the findings of [32,33,34,35].However, for the updated results for the photon PDF the uncertainties are generally smaller than the line width in the plot, with the central value lying towards to lower end of the NNPDF band.Therefore, we can safely say that there is no room for large photon-initiated contributions with sizeable uncertainties at high mass.Rather, we are now in the era of precision photon PDF phenomenology.

Fig. 2 .
Fig.2.Average squared sum of the systematic shift differences (r i − r j ) 2 for the first four rapidity bins of the ATLAS 7 TeV jet data[12].
for NNLO theory to be too high at low , driven by positive NNLO/NLO K-factor.
inclusive jet cross-sections mea-ATLAS[6] and NNLO perturbative QCD prefunction of the jet pT in slices of rapidity, for ith R = 0.4 normalized to the NLO result.The represent the scale uncertainty of the theory tained by varying µR and µF as described in red dashed line displays the NNLO/NLO ratio tiplicatively for electroweak corrections[37].

FIG. 2 : 19 Fig. 4 .
FIG. 2: NLO and NNLO k-factors for jet production at p s = 7 TeV.The lines correspond to the double di↵erential k-factors (ratios of perturbative predictions in the perturbative expansion) for pT > 100 GeV and across six rapidity |y| slices.

CMS W + c ( 7 Fig. 6 . 1 x 1 xFig. 7 .
Fig.6.Comparison to CMS W boson[20] and W + c production[21] data at NNLO, before and after including the data in the fit.In the former case the W asymmetry is shown for clarity, although the individual W ± data are fit to.

Table 1 .
χ 2 per number of data points for fit to ATLAS jets data

Table 3 .
χ 2 at NLO and NNLO for the prediction (fit) to the new LHC and Tevatron data included in the MMHT -2016 fit.Also shown is the total number of points without (with) the new data included.