Evolution in the orbital structure of quiescent galaxies from MAGPI, LEGA-C and SAMI surveys: direct evidence for merger-driven growth over the last 7 Gyr

We present the ﬁrst study of spatially integrated higher-order stellar kinematics over cosmic time. We use deep rest-frame optical spectroscopy of quiescent galaxies at redshifts z = 0 . 05 , 0.3 and 0.8 from the SAMI, MAGPI and LEGA-C surveys to measure the excess kurtosis h 4 of the stellar velocity distribution, the latter parametrised as a Gauss-Hermite series. Conservatively using a redshift-independent cut in stellar mass ( M (cid:63) = 10 11 M (cid:12) ), and matching the stellar-mass distributions of our samples, we ﬁnd 7 σ evidence of h 4 increasing with cosmic time, from a median value of 0 . 019 ± 0 . 002 at z = 0 . 8 to 0 . 059 ± 0 . 004 at z = 0 . 06 . Alternatively, we use a physically motivated sample selection, based on the mass distribution of the progenitors of local quiescent galaxies as inferred from numerical simulations; in this case, we ﬁnd 10 σ evidence. This evolution suggests that, over the last 7 Gyr, there has been a gradual decrease in the rotation-to-dispersion ratio and an increase in the radial anisotropy of the stellar velocity distribution, qualitatively consistent with accretion of gas-poor satellites. These ﬁndings demonstrate that massive galaxies continue to accrete mass and increase their dispersion support after becoming quiescent.


INTRODUCTION
The most massive galaxies in the Universe are thought to form in two phases.The first stage is dominated by dissipa-E-mail: francesco.deugenio@gmail.comtive gas accretion and in-situ star formation.In the second stage, after cosmic noon (Madau & Dickinson 2014;Förster Schreiber & Wuyts 2020), massive galaxies typically become quiescent, but continue to grow in both mass and size through accretion of low-mass gas-poor satellites (Bezanson et al. 2009;Naab et al. 2009;Oser et al. 2010Oser et al. , 2012;;Stockmann et al. 2021).
This theoretical picture was drawn to explain the observed changes in the average properties of quiescent galaxies across cosmic time.Observing campaigns of the nearby Universe have revealed that local massive quiescent galaxies are dispersion dominated (Davies et al. 1983;Emsellem et al. 2011) and intrinsically round or triaxial (Lambas et al. 1992;Vincent & Ryden 2005;Weijmans et al. 2014;Foster et al. 2017;Li et al. 2018); they have a large fraction of dynamically warm and hot orbits (Zhu et al. 2018;Santucci et al. 2022), and flat or even 'u'-shaped stellar-age radial profiles (Zibetti et al. 2020).In contrast, massive quiescent galaxies in the early Universe were smaller (Daddi et al. 2005, Trujillo et al. 2007, which suggests size evolution), and intrinsically flatter (van der Wel et al. 2011, Chang et al. 2013, suggesting a higher rotationto-dispersion ratio compared to local quiescent galaxies, as confirmed by e.g.Newman et al. 2015).
However, connecting primordial to local galaxies is complicated by progenitor bias (van Dokkum & Franx 2001): the progenitors of some of the local quiescent galaxies were not already quiescent several billion years ago.This means that, in principle, the physical differences between local and distant quiescent galaxies could be due entirely to changing demographics.
Indeed, there is now overwhelming evidence for inside-out growth of star-forming galaxies, both in the local Universe (where we can even measure the instantaneous size-growth rate, Pezzulli et al. 2015) and at all epochs until cosmic noon (Robotham et al. 2022;Wang et al. 2019;Nelson et al. 2016;Paulino-Afonso et al. 2017;Suzuki et al. 2019).At any given moment in the history of the Universe, star-forming galaxies are on average larger than quiescent galaxies of the same stellar mass (e.g.van der Wel et al. 2014;Mowla et al. 2019); for this reason, also newly quiescent galaxies have larger average size compared to the extant quiescent population of the same mass (Newman et al. 2012;Carollo et al. 2013;Wu et al. 2018), potentially explaining the increase in the average physical size of the quiescent galaxy population over cosmic time. 1 To account for the effect of this progenitor bias on the size evolution of quiescent galaxies, one must keep track of the properties of both quiescent and star-forming galaxies.
However, large photometric and grism-spectroscopy surveys such as CANDELS (Grogin et al. 2011;Koekemoer et al. 2011) and 3D-HST (Brammer et al. 2012) have unequivocally shown that demographic changes cannot explain, alone, the observed size difference between early and contemporary quiescent galaxies.The comoving volume density of compact quiescent galaxies decreases with cosmic time, which requires physical growth of individual galaxies after they became qui-1 In principle, star-forming galaxies could also experience sudden structural changes just before becoming quiescent, for example as a result of a final, central starburst (Chen et al. 2019;D'Eugenio et al. 2020, e.g.), but this 'rapid path to quiescence' (Wu et al. 2018) does not necessarily lead to changes in size (cf.D 'Eugenio et al. 2020, Wu et al. 2020, and Setton et al. 2020, 2022).Besides, it is a rare evolutionary path in the local Universe (Rowlands et al. 2018) and, even around cosmic noon, when it is most common, it seem to explain only half the observed growth in the comoving number density of the quiescent population (Belli et al. 2015).
It is to explain this inferred size growth that minor dry mergers were first invoked (Bezanson et al. 2009;Naab et al. 2009).Alternative explanations do not account for all the observations.Star-formation episodes after quiescence (rejuvenation) can be ruled out (as main mechanism) based on direct measurements of the star-formation history of quiescent galaxies (Chauke et al. 2019; besides, this mechanism is not consistent with the observed changes in shape, e.g.van der Wel et al. 2011).While major dry mergers could in principle explain the observed evolution, their predicted rate (Oser et al. 2012;Nipoti et al. 2012;Sweet et al. 2017) appears insufficient to account for the magnitude of the observed changes, because of the linear relation between mass and size growth in major mergers (Naab et al. 2009).In addition, the small number of major dry mergers may not reproduce the intrinsic shape and the spatially resolved stellar kinematics of the most massive galaxies, because the orbital angular momentum of the progenitors is locked within the stellar orbits of the remnant (Bois et al. 2011, but see e.g.Taranu et al. 2013 andLagos et al. 2022 for a contrasting view).
In contrast, minor dry mergers are consistent with all the observed changes.They explain the observed change in shape and the loss of angular momentum, while the low relative mass of the satellites (mass ratio 6:1 and higher, Naab et al. 2014), explains the steep radial-to-mass growth rate (Bezanson et al. 2009;Naab et al. 2009).The conservation of orbital energy between the accreted satellite and its stars means that most stars are dispersed along radially anisotropic orbits, changing the light distribution and stellar population content more at large radii than at small radii, as required by observations of weak evolution in the central surface mass density (e.g.Bezanson et al. 2009).
Unfortunately, by definition, minor mergers are hard to constrain observationally beyond the local Universe.In particular, the uncertainty on the timescale over which a merger is recognisable translates into large uncertainties on the merger rate (Newman et al. 2012).However, a sufficient number of minor mergers will have a visible impact on the stellar kinematics of the accreting galaxy.In particular, if accreted stars are dispersed about the orbit of the satellite, we expect them to move along elongated orbits, which is different from the results of both star-formation and major dry mergers (Bois et al. 2011).By comparing the velocity distribution of massive quiescent galaxies across cosmic time, we can test another prediction of the minor-dry-merger hypothesis.
While integral field spectroscopy surveys enable us to accurately model the stellar orbital distribution of local galaxies (Cappellari et al. 2007;Zhu et al. 2018) and to compare the detailed properties of their spatially resolved velocity distributions to simulations (van de Sande et al. 2017Sande et al. , 2019)), this type of observation remains out of reach beyond the local Universe, where large samples with high-quality measurements are limited to integrated spectra.Fortunately, even spatially integrated measurements preserve some information about the assembly history of galaxies.There are some caveats to this statement: following a major merger, the distribution function does relax thus leading to loss of information (e.g.Lynden-Bell 1967).However, in general, relative to an isotropic stellar system, an over-abundance of radial orbits is reflected in the shape of the stellar velocity distribution, caus-ing it to deviate from a Gaussian and become more peaked with more prominent wings (leptokurtic); conversely, an overabundance of circular orbits reflects a less-peaked (platykurtic) velocity distribution (van der Marel & Franx 1993).So to test the hypothesis that the observed evolution in the kinematics and size of massive, quiescent galaxies is due to the cumulative effect of many minor dry mergers, we need highquality integrated spectra for a statistical sample of galaxies spanning a sizeable fraction of the history of the Universe.
Until recently, the necessary combination of large sample size and high-quality integrated spectra did not exist, but the advent of large, absorption-line spectroscopy surveys of the early Universe changed this state of affairs.
In this work, we leverage the extraordinary quality, sample size and large look-back time of the LEGA-C and MAGPI data, complemented with local observations from the SAMI Galaxy Survey, to investigate the cosmic evolution of the excess kurtosis (parametrised by h4, van der Marel & Franx 1993;Gerhard 1993) as a direct tracer of the assembly history of galaxies.After introducing the data and sample in § 2, we show that h4 increases with cosmic time ( § 4) and discuss the implications of our findings on the size growth of quiescent galaxies ( § 5).A summary of our findings is provided in § 6.

The SAMI Galaxy Survey
The SAMI Galaxy Survey (hereafter simply: SAMI) is a large, integral-field optical-spectroscopy survey of local galaxies.It spans a range of redshifts 0.04 < z < 0.095, a stellar mass range 10 7 < M < 10 12 M , all morphological types and environments (from isolated galaxies to eight clusters, local environment density 0.1 < Σ5 < 100 Mpc −2 , Bryant et al. 2015, Owers et al. 2017;see Brough et al. 2017 for the definition of Σ5).SAMI observations were obtained at the 3.9-m Anglo-Australian Telescope, using the Sydney-AAO Multi-object Integral field spectroscopy instrument (hereafter, the SAMI instrument; Croom et al. 2012).The SAMI instrument has 13 integral field units (IFUs), consisting of a fused-fibre bundle (hexabundle; Bland-Hawthorn et al. 2011;Bryant et al. 2014) of 61 individual fibres of 1.6-arcsec diameter, giving a total IFU diameter of 15 arcsec.The 13 IFUs are deployable inside a 1-degree field of view, and are complemented by 26 individual sky fibres.The fibres are fed to the double-beam AAOmega spectrograph (Sharp et al. 2006); the blue arm was configured with the 570V grating at 3750-5750 Å (R = 1812, σ = 70.3km s −1 ) and the red arm was configured with the R1000 grating at 6300-7400 Å (R = 4263, σ = 29.9km s −1 van de Sande et al. 2017).Each galaxy was exposed for approximately 3.5 hours, stacking seven 0.5-h dithered exposures (Sharp et al. 2015).The median seeing full width at half maximum (FWHM) of SAMI is 2.06 ± 0.40 arcsec.The data reduction process is outlined in Sharp et al. (2015) and Allen et al. (2015).Ensuing improvements are described in the public data release papers (Green et al. 2018;Scott et al. 2018).Here we use 3068 unique datacubes from the third and final public data release (Data Release 3, hereafter: DR3 Croom et al. 2021).

MAGPI
The Middle Ages Galaxy Properties with Integral field spectroscopy survey (hereafter, MAGPI Foster et al. 2021) is a Large Program with the Multi-Unit Spectroscopic Explorer (MUSE, Bacon et al. 2010) on the 8-m European Southern Observatory (ESO) Very Large Telescope (VLT).MAGPI aims to study spatially resolved galaxy properties in the uncharted cosmic 'Middle Ages' at z ≈ 0.3, between the epoch of 'classic' local surveys (like SAMI) and higher-redshift studies (like LEGA-C).MUSE was configured in the large-field mode (1 × 1-arcmin 2 field of view), aided by Ground Layer Adaptive Optics GALACSI (Arsenault et al. 2008;Ströbele et al. 2012) to achieve a spatial resolution with median FWHM of 0.6-0.8arcsec (comparable, in physical units, to the spatial resolution of SAMI).MAGPI spectra cover the approximate rest-frame wavelength range 3600 < λ < 7200 Å, with a median spectral resolution FWHM of 1.25 Å (inside one effective radius, the FWHM varies by 3 per cent).
The sample consists of 60 central galaxies, drawn from the Galaxy and Mass Assembly survey (GAMA; Driver et al. 2011;Liske et al. 2015;Baldry et al. 2018) and from two legacy programs, targeting clusters Abell 370 (Program ID 096.A-0710; PI: Bauer) and Abell 2744 (Program IDs: 095.A-0181 and 096.A-0496; PI: Richard).In addition to the central galaxies, MAGPI will concurrently observe one hundred satellite galaxies in the target redshift range, plus any background galaxy inside the field of view.
The survey is in progress, but MAGPI has already obtained data for fifteen fields, which we use in this work.An overview of the observations and data reduction is provided in the survey paper (Foster et al. 2021), while a complete description of the data reduction pipeline (based on the MUSE pipeline, Weilbacher et al. 2020 and on the Zurich Atmosphere Purge sky-subtraction software, Soto et al. 2016), will be provided in an upcoming paper (Mendel et al., in prep.).Each MAGPI cube is segmented into 'minicubes', centred on individual galaxy detections.

LEGA-C
Our redshift baseline is completed by the Large Early Galaxy Astrophysics Census, a large, deep optical-spectroscopy survey of galaxies between 0.6 < z < 1.0 (van der Wel et al. 2016).The LEGA-C sample consists of 3000 primary galaxies, Ks-band selected from the UltraVISTA catalogue (Muzzin et al. 2013).
Observations were carried at the ESO VLT using the VI-MOS spectrograph (Le Fèvre et al. 2003) in its multi-object configuration, with mask-cut slits of length ≥ 8 arcsec and width 1 arcsec; all slits from the main survey were oriented in the North-South direction, so they were aligned randomly relative to the major axes of the target galaxies.The seeing median FWHM, measured from a Moffat fit to the slit data, is 0.75 arcsec (van Houdt et al. 2021).The typical observedframe spectral interval spans 6300 < λ < 8800 Å(the exact range depends on the slit position inside the mask).The spectral resolution is R = 2500 (but the effective spectral resolution is R = 3500, due to the LEGA-C targets underfilling the slit; Straatman et al. 2018).Each mask was exposed for 20 h, reaching a continuum signal-to-noise ratio S/N ≈20 Å −1 .Thanks to the depth of these observations, most targets have successful kinematics measurements (93 per cent), resulting in a mass-completeness limit of 10 10.5 M (van der Wel et al. 2021).
In this work, we use the 1-d LEGA-C spectra from the third public data release (DR3, van der Wel et al. 2021).These were obtained from optimal extraction (Horne 1986) of the 2-d spectra.The large physical width of the LEGA-C slits (7.5 kpc at z = 0.8) means that the 1-d spectra sample a representative fraction of the targets' light (the ratio between the slit width and the circularised galaxy diameter is 1.0±0.5 for our redshift-evolution sample, see § 3 for the sample selection).To measure h4, we use the method outlined in § 2.3 and described in D' Eugenio et al. (2023, hereafter: DE23).We set the observed-frame spectral FWHM to a wavelengthindependent value of 2.12 Å (corresponding to 86 km s −1 , van der Wel et al. 2021).Even though we use emission-line subtracted spectra (Bezanson et al. 2018), the precision and accuracy of the subtraction do not affect our measured kinematics (DE23).
LEGA-C is the highest-redshift survey we use, so it has the least spatial information and narrowest wavelength range.For this reason, in order to draw a fair comparison with the other two datasets, we match the quality of the SAMI and MAGPI surveys to reproduce the observing setup of LEGA-C ( § 2.2).The impact of the different observing setup is discussed in § 4.2.

Data homogenisation
To match the high-redshift slit spectroscopy of LEGA-C, we artificially 'redshift' the SAMI and MAGPI galaxies to z=0.78.This is done in two steps: i) blur the point-spread function to match the LEGA-C seeing, and ii) extract the spectrum within a LEGA-C-like slit.In the first step, we convolve the datacubes with a Gaussian kernel; the Gaussian FWHM is calculated for each galaxy as the difference in quadrature between the median LEGA-C seeing FWHM of 5.6 kpc (at z=0.78) and the observed SAMI or MAGPI FWHM (for SAMI, this value is obtained from the SAMI DR3 catalogue; for MAGPI, the values are retrieved from the processed datacubes).Note that we also match the three surveys in wavelength, by removing any data outside the restframe interval 3600-5300 Å.The matching procedure is illustrated in the right column of Fig. 1, where the top, middle and bottom rows show data from SAMI, MAGPI and LEGA-C, respectively.For MAGPI and SAMI, the images (panels c and f) are reconstructed from the datacubes, shown at the original spatial resolution.The bottom quadrant of each panel shows the result of the convolution to match the LEGA-C seeing.For LEGA-C, the image is the HST F814W photometry (panel i); the bottom quadrant has been convolved with the ground-based LEGA-C seeing, to illustrate the VLT/VIMOS view of the target.For each galaxy, we calculate the noise spectrum by applying the square of the kernel to the variance datacube, but we ignore spatial correlations (we estimate the effect spatial correlation on the noise by rescaling the noise spectrum after the first fit; see § 2.3).After this procedure, the datacube matches the average spatial resolution of LEGA-C.
The second step consists of creating the 2-d slit spectrum; we convolve the datacube with a slit of width 7.5 kpc (corresponding to 1 arcsec at z=0.78) and length 75 kpc (white dashed rectangle in right column of Fig. 1; in practice, the slit always exceeds the size of the SAMI IFU and of most MAGPI minicubes).The slits are placed on the centre of the galaxy2 and are oriented in the North-South direction (i.e., randomly compared to the position angle of the target).
Next we simulate the degeneracy between Doppler shift and spatial offset along the dispersion direction.For each IFU spatial pixel (spaxel), we calculate its spatial offset from the centre of the mock slit, in units of (equivalent) LEGA-C pixels; because each LEGA-C spaxel consists of five detector pixels, we calculate this offset for up to ±2.5 pixels.For LEGA-C, each detector pixel corresponds to 0.6 Å, so the wavelength shift is in the range ±1.5 Å.We apply this shift by re-binning the spectrum of each IFU spaxel to the new wavelength grid.
For each spectral pixel, the flux in each slit spaxel is calculated as a linear sum of the flux of all SAMI spaxels that intersect the slit spaxel, weighted only by the overlapping area fraction.After calculating the spectrum for each slit spaxel, we optimally extract the resulting 2-d spectrum and obtain the final 1-d spectrum (Horne 1986).This latter step is in principle different from the procedure adopted in van der Wel et al. (2021), who use HST photometry with high spatial resolution to guide the extraction.But in practice the S/N of SAMI and MAGPI is so high that using a prior from photometry is not required, and we can use the observed slit profile for the extraction.Each spectrum is cut between 3600-5300 Å, to match the typical rest-frame wavelength range of LEGA-C.For the MUSE 1-d spectra, we also mask the wavelength region affected by the GALACSI laser (large grey shaded area in the middle row of Fig. 1).
One potential concern is to quantify what random/systematic errors in the h4 measurements are introduced in the process of degrading the SAMI data to match LEGA-C.To address this concern we use 1-d spectra from an elliptical aperture with semi-major axis Re (see § 2.4).These spectra provide a 'baseline' h4 measurement before the SAMI data are matched to LEGA-C, thus enabling us to assess the impact of the the LEGA-C observing setup on h4.For these spectra, we use the full wavelength range of SAMI; note that Comparison between three galaxies, randomly chosen from the SAMI, MAGPI and LEGA-C samples.For each galaxy, we show the data (dark grey) and best-fit spectra (red), alongside the relative residuals (black dots).Vertical lines/regions are masked because of possible emission lines (regardless of whether lines were actually detected) or because of instrument setup (e.g. the GALACSI laser band for MAGPI, panels d and e).The inset figures show the galaxy images (from the datacube for SAMI and MAGPI, panels c and f, from HST F814W for LEGA-C, panel i).In each of the three images (panels c, f and i), we indicate the galaxy effective radius with a solid black circle, and the applied slit with a dashed white line.The lowest quadrant of each image shows the data convolved to the ground-based spatial resolution of LEGA-C.before measuring the kinematics, we convolve the red arm to the spectral resolution of the blue arm, using the appropriate Gaussian kernel (van de Sande et al. 2017); the gap between the blue-and red-arm spectra is masked.A comparison between the default h4 and this baseline value is provided in § 4.2.

Integrated higher-order kinematics
To measure h4, we follow the procedure outlined in DE23.Our measurements are based on 1-d spectra spanning restframe B-and g-band; from these data, we infer the LOSVD using the penalised pixel fitting algorithm ppxf (Cappellari 2017(Cappellari , 2022)).We configured ppxf to model the line-ofsight velocity distribution (LOSVD) as a 4 th -order Gauss-Hermite series (van der Marel & Franx 1993;Gerhard 1993).ppxf then models the spectra using a linear combination of simple stellar population (SSP) spectra from the MILES library (Vazdekis et al. 2010(Vazdekis et al. , 2015)), using BaSTI isochrones (Pietrinferni et al. 2004(Pietrinferni et al. , 2006) ) and solar [α/Fe].As alternatives to MILES SSP library, we also use the MILES stellar library (Falcón-Barroso et al. 2011), the IndoUS stellar library (Valdes et al. 2004), and the C3K library (Conroy & van Dokkum 2012) with MIST isochrones (Dotter 2016;Choi et al. 2016).We concluded that the choice of library does not affect our results (see § 4.2).However, as discussed in DE23, we adopt the MILES SSP library as default because it provides the highest fidelity in reproducing the observed spectra (in agreement with other authors, e.g.van de Sande et al. 2017;Maseda et al. 2021).In our setup, ppxf returns the first (non-trivial) four moments of the LOSVD: mean velocity V , velocity dispersion σ, h3 (a measure of skewness) and h4 (measuring excess kurtosis).Note that, throughout this article, we only use h4 and ignore the other three measurements, including σ.In particular, when we use the second moment σap, its values are derived from other sources, which model the LOSVD as a Gaussian (see § 2.4).
In principle, because the focus of this work are quiescent galaxies, our spectra should be free from emission lines.In practice, however, even quiescent galaxies can display strong emission lines, particularly at higher redshift (Maseda et al. 2021).Moreover, we will consider a subset of higher-redshift star-forming galaxies as substitutes for the progenitors of local quiescent galaxies.For these reasons, we mask the wavelength regions where gas emission lines may arise, regardless of whether any emission was actually detected.This ensures a homogeneous treatment of all galaxies, but we note that simultaneous fitting of the emission lines or σ-clipping-based rejection do not change our conclusions (DE23).
A key feature of ppxf is penalisation against non-Gaussian LOSVDs, to recover reliable solutions in low-S/N data.The amount of penalisation is determined by the keyword bias, which we set to its default value.As explained in DE23, this choice does not affect our measurements of h4, because of the high S/N of our spectra; the average S/N of our sample is even higher than of DE23, so the effect of bias on the h4 measurements must be smaller.
During the data homogenisation process, the seeing matching and slit convolution introduce correlations between the pixels, but we do not track this in the noise spectrum.To compensate for correlated noise, we repeat each fit twice.In the first iteration, we use uniform weights for all (valid) pixels.This step enables us to measure an empirical S/N by estimating the root mean square of the residuals in a moving window (see Looser et al., in prep.).After this fit, we rescale the noise spectrum so that the reduced χ 2 is unity.In the second fit, we use this rescaled noise spectrum and 3-σ clipping to remove any outliers.The impact of some of our choices on the results is discussed in § 4.2.
As for the measurement uncertainties, we use the default values from ppxf, which we checked against Monte-Carloderived uncertainties as explained in DE23.
Example ppxf fits are shown in Fig. 1: in panels a (SAMI 184689), d (MAGPI 1206192190) and g (LEGA-C 92258 M12).These three galaxies are randomly selected from the mass-matched sample (defined in § 3.2).In each of the three panels, the grey line is the galaxy 1-d slit spectrum and the red line is the best-fit ppxf spectrum, with h4 reported in the bottom right corner.The vertical grey lines/regions are spectral pixels/intervals that have been masked.Panels b, e and h show the relative residuals.

Age-dependent bias
In DE23, we have qualitatively shown that h4 information is 'distributed' in both strong absorption lines as well as less prominent features.It is a well known fact of stellar evolution that number and prominence of optical features in an SSP spectrum is a strong function of the SSP age.This raises the question of how the light-weighted age of a galaxy affects the fidelity of our h4 measurements.This question is particularly important for our work, because we aim at comparing h4 across different cosmic epochs, when quiescent galaxies have systematically different stellar population ages.
To understand how stellar population light-weighted age affects our ability to measure h4, we create two sets of mock spectra.These correspond to a quiescent galaxy with formation redshift z = 3, observed at z = 0.73 (the redshift of LEGA-C) and at z = 0.05 (the redshift of SAMI).We model the galaxy as a SSP from the already described MILES library, adopting solar metallicity and either age equal to 4.5 Gyr (at z = 0.73) or 10.5 Gyr (at z = 0.05).For each of these two spectra, we create two models: one with h4 = 0 and one with h4 = 0.06, resulting in four model spectra.For each of these four spectra, we create one thousand random realisations by adding Gaussian noise corresponding to a S/N = 20 Å −1 .We then use ppxf to measure h4, with the same setup we used for the real data.
We define ∆ h4 as the median difference between the measured and the input value of h4, and find that, for h4 =0, ∆ h4 = −0.003± 0.001.Similarly, for h4 =0.06, we find ∆ h4 = −0.002± 0.001.These results apply to both the 4.5-Gyr-old and the 10.5-Gyr-old mock galaxies.These offsets are statistically significant, although only to the 3-or 4-σ level.The standard deviation of the ∆ h4 distributions are 0.024 and 0.019 for h4 =0 (for the 4.5-Gyr-old and the 10.5-Gyr-old mocks respectively) and 0.020 and 0.017 for h4 Effect of different measurement configurations on the reference value of h 4 for SAMI quiescent galaxies.The sand contours mark the distribution of SAMI quiescent galaxies with M ≥ 10 10.5 M (the contours enclose the 30 th , 50 th , and 90 th percentiles of the data); the black dots are the mass-matched sample (M ≥ 10 11 M ).The reference value is measured inside the elliptical aperture of semi-major axis equal to one Re, using the MILES SSP library and the full wavelength range of SAMI.Panel a shows the effect of using a slit instead of the elliptical aperture (including seeing convolution, see § 2.2); panel b shows the effect of restricting the wavelength range to the wavelength range of LEGA-C; and panel c combines both the slit setup and restricted wavelength range, thus matching our default measurements.This figure underscores the importance of our data homogenisation ( § 2.2).
=0.06 (for the 4.5-Gyr-old and the 10.5-Gyr-old mocks respectively).As expected from considerations about the depth of absorption features, at fixed S/N , the scatter is larger for the younger population.The difference however is not dramatic (only ≈ 20 per cent).What is more important, is that in all four cases, the offset is negligible compared to other sources of systematic errors (such as stellar template libraries, see DE23).Moreover, the standard deviation of each of the four ∆ h4 distributions is 2-3 times smaller than the precision threshold for our quality selection ( § 3.1).

Measurement bias
Because we aim to compare measurements between different cosmic epochs and different surveys, we need to understand how different data quality and instrument setup affect the value of h4.To address this question, we leverage IFU spectroscopy from SAMI.For each galaxy, we define a reference h4 measurement from the aperture spectrum inside the elliptical aperture of semi-major axis equal to one Re.These spectra cover the whole wavelength range of SAMI.We then compare this reference value to the default measurement, obtained from 1-d synthetic-slit spectroscopy and designed to match the observing setup of LEGA-C (as described in § 2.2).We split the data homogenisation procedure in two steps: aperture matching and wavelength matching.In Fig. 2a, we compare the reference h4 values to the h4 we measure from the slit setup (including seeing convolution); the golden contours trace the distribution of SAMI quiescent galaxies with M ≥ 10 10.5 M , the dots are quiescent SAMI galaxies from the mass-matched sample (M ≥ 10 11 M ).We find that, for both stellar-mass selections, the slit setup does not bias h4; considering only massive galaxies (black dots in Fig. 2), the median difference between this h4 and the reference value is ∆ h4 = 0.003 ± 0.001, with a root mean square (rms) of 0.010.In panel b, we compare the reference h4 to the value we measure from the same aperture, but using only the bluest wavelengths, to match the rest-frame range of LEGA-C.In this case, the picture is opposite to what we found in panel a: the offset is large but the scatter is small: the median difference is ∆ h4 = −0.017± 0.001 and the rms is 0.011.Finally, in panel c, we combine both slit aperture and wavelength matching, thus obtaining our homogenised, default h4 values.As expected, we find both a systematic offset ( ∆ h4 = −0.015± 0.002) and a large rms (0.019).

Stellar masses and ancillary data
The ancillary data we use in this work are the same as in DE23, to which we refer for a more detailed discussion.Here we provide a summary and the most important remarks.
For SAMI, stellar masses M are derived from the i-band total magnitude and a mass-to-light ratio based on g−i colour (Taylor et al. 2011).Star-formation rates (SFR) are derived from the attenuation-corrected Hα luminosity (Croom et al. 2021).For MAGPI, we use M and SFR full SED fits from prospect (Robotham et al. 2020).We then define SAMI and MAGPI quiescent galaxies as lying more than 1 dex below the star-forming sequence of Whitaker et al. (2012).For LEGA-C, we use SED fits to observed-frame BV rizY J photometry, based on the prospector software (Leja et al. 2019;Johnson et al. 2021), as explained in van der Wel et al. (2021).We compared these measurements using a common subset with M derived from magphys, and found systematic differences of 0.03 dex with a scatter of 0.07 dex, which are acceptable for the goals of this article (DE23).
In this article we consider primarily quiescent galaxies (but the progenitor-matched sample from LEGA-C also includes star-forming galaxies, see § 3.3).Quiescent galaxies are defined as lying 1.6 dex below the star-forming sequence (for SAMI and MAGPI galaxies with z < 0.41), as having Hβ equivalent width > −1 Å (for MAGPI galaxies with z > 0.41, where the MUSE spectral range does not cover Hα), or as ly-ing in the quiescent corner of the UVJ diagram (for LEGA-C).This classification follows DE23.
Galaxy sizes and shapes are derived from the best-fit Sérsic models to observed frame r−band photometry (for SAMI and MAGPI) and to HST ACS F814W photometry (for LEGA-C).In particular, galaxy sizes are the half-light semi-major axis of the model.We note that for SAMI, replacing r−band photometry with g−band photometry does not change our results (DE23).
In addition to our main selection based on M , we use two alternative methods based on aperture velocity dispersion σap, virial mass Mvir and total masses from dynamical models MJAM.For SAMI and LEGA-C, σap is taken from the literature.For SAMI, it is the value measured inside one effective radius (Croom et al. 2021).For LEGA-C, it is the value measured from the 1-d slit spectra (Bezanson et al. 2018;van der Wel et al. 2021).For MAGPI, we use our own measurements obtained with ppxf, using the IndoUS stellar library and assuming a Gaussian LOSVD, for consistency with the σap measurements of SAMI and LEGA-C.For Mvir, we use the expression of van der Wel et al. ( 2022), which uses semimajor axis half-light sizes, incorporates a correction for nonhomology (based on the Sérsic index n Cappellari et al. 2006) and adds an inclination correction.The dynamical models are based on Jeans Anisotropic Models (JAM, Cappellari 2008), adapted to slit spectroscopy as described in van Houdt et al. (2021).Note that MJAM is only available for roughly one third of the sample, because galaxies where the photometric major axis is not aligned to the slit were not modelled.

SAMPLE SELECTION
In this section, we aim to present the motivation, selection criteria and properties of the two samples we use in this work.The first sample, consisting of massive, quiescent galaxies from SAMI, MAGPI and LEGA-C, is the 'mass-matched sample', aiming at characterising the evolution of the quiescent galaxy population over cosmic time, including both demographic changes and physical evolution.The second sample is the 'progenitor-matched sample', selected from SAMI and LEGA-C to provide a plausible connection between local quiescent galaxies and their progenitors.

Quality selection
Because in this work we focus on massive, quiescent galaxies, we consider only galaxies with M > 10 11 M (which is also the completeness limit of LEGA-C van der Wel et al. 2021).Above this threshold, we have 211, 22 and 1027 galaxies from SAMI, MAGPI and LEGA-C (for LEGA-C, we consider only primary galaxies with flag use=1 Straatman et al. 2018).
Following DE23, we impose a quality cut at u(h4) < 0.05, to ensure a reliable measurement of h4.After this cut, we have 200, 22 and 692 galaxies for the three surveys; from these, we consider only quiescent galaxies, arriving to a final sample of 135, 22 and 479 targets for SAMI, MAGPI and LEGA-C, respectively.The completeness of each survey relative to the samples before any quality cut is 99, 100 and 90 per cent.
For the progenitor-matched sample, we consider LEGA-C galaxies with M > 10 10.5 M , which is below the mass cut of the main sample at M > 10 11 M .This cut is chosen to account for the increase in stellar mass between the epochs of LEGA-C and SAMI.The progenitor pool consists of both star-forming and quiescent galaxies, with lower completeness compared to the main sample.In particular, after imposing u(h4) < 0.05, the completeness for LEGA-C quiescent galaxies is 81 per cent, and for star-forming galaxies is only 28 per cent (see DE23, their fig.4).Completeness decreases with decreasing M , which means our results for the progenitormatched sample are likely a conservative estimate (see § 4.1).

The mass-matched sample from SAMI, MAGPI and LEGA-C
The first task we aim to accomplish is to compare the h4 distribution of galaxies at fixed M .This defines the 'massmatched sample' (Fig. 3).Physically, a selection at fixed M is a logical contradiction, because the stellar mass of central galaxies increased with cosmic time.However, this sample provides a conservative, 'minimum-baseline' measurement for any evolution of h4, free from the assumptions needed to connect local galaxies with progenitor-like galaxies at higher redshift (which we do in § 3.3).In particular, the mass-matched sample puts together local galaxies with over-massive (false) progenitors at higher redshift; this fact, together with the h4-M correlation (DE23), means that the mass-matched sample is biased to find h4 decreasing with cosmic time.
The three samples are matched in M (as closely as possible).We match the LEGA-C sample to the mass function of SAMI, because, among these two surveys, it is LEGA-C that has the largest volume.For MAGPI, where the effective survey volume is relatively small, we resort to taking all galaxies in the same mass range as SAMI, without matching the mass function.
For SAMI, the parent sample consists of 141 quiescent galaxies with M ≥ 10 11 M (filled grey histogram in panel a).From this sample, we remove six with u(h4) ≥ 0.05, arriving to a valid sample of 135 galaxies.This is the valid sample, which for SAMI coincides with the mass-matched sample (so in panel a the solid red empty and solid red filled histograms coincide).
For MAGPI, we have 22 massive, quiescent galaxies, all of which meet the quality selection.Given this small sample size, we do not resample this set to match the mass distribution of SAMI (so, for MAGPI too, the valid and mass-matched samples coincide, panel b).
For LEGA-C, the parent sample contains 530 quiescent galaxies with M ≥ 10 11 M , of which 479 meet the quality selection criterion (solid red empty histogram in panel c).We then weight these 479 galaxies to match the mass function of SAMI; the weights are simply the ratio between the SAMI and LEGA-C valid histograms.The resulting (weighted) mass distribution of the resulting sample is traced by the solid red filled histogram in panel c.Two galaxies outside the mass range of SAMI get weight zero, so they are effectively removed from the mass-matched LEGA-C sample.For the others, we rescale the weights so that they add to the sample size of LEGA-C (this preserves the relative weight of different samples when we combine them).After this rescaling, the minimum and maximum weights are 0.25 and 2.36.
Mass-matched Sample : definition The dashed regions show the cut at M > 10 11 M , the filled grey histograms are the parent samples, consisting of all quiescent (Q) galaxies above the mass cut.The solid red empty histograms are the valid samples, consisting of all quiescent galaxies above the mass and quality selection cuts.The filled red histogram is the mass-matched sample; for SAMI and MAGPI, this coincides with the valid sample; for LEGA-C, it consists of a subset of the valid sample, chosen to reproduce the mass distribution of the SAMI valid sample.

The progenitor-matched sample from SAMI and LEGA-C
While the mass-matched sample addresses the issue of massrelated bias, we know that galaxies evolve in both mass and size, even after becoming quiescent (e.g.Taylor et al. 2010;van der Wel et al. 2014).To address the effect of this evolution, we use data from the publicly available IllustrisTNG simulations to inform the mass distribution of the progenitors of the SAMI sample (Marinacci et al. 2018, Naiman et al. 2018, Nelson et al. 2018, Springel et al. 2018and Pillepich et al. 2018).We then use the progenitors' mass distribution to select progenitor-like galaxies from LEGA-C.We do not attempt any match for MAGPI, so this survey is not part of the 'progenitor-matched sample'.Following Rodriguez-Gomez et al. (2015), we define a galaxy as a subset of the simulation volume identified by and their possible z = 0.7 progenitors drawn from LEGA-C, including both star-forming galaxies (dashed blue histogram) and quiescent galaxies (red filled histogram).The mass distribution of the progenitors was derived from IllustrisTNG, but we note that, in the simulation, the fraction of quiescent progenitors is only 30 per cent.
the subfind algorithm (Springel et al. 2001;Dolag et al. 2009) and use the combined mass of all stellar particles as the galaxy stellar mass.While this is not immediately comparable to observed stellar masses, even truncating at a galaxydependent aperture may introduce unwanted bias (de Graaff et al. 2022).The effect of this aperture bias is secondary compared to the inclusion of star-forming galaxies in the progenitor sample.We consider data from the run TNG100-1, then take all quiescent galaxies from snapshot 94 (z = 0.06, matching SAMI) and trace their 'main-branch' progenitors to snapshot 58 (z = 0.73, matching LEGA-C.Main-branch progenitors are defined as the progenitors with the most massive history behind them; De Lucia & Blaizot 2007;Rodriguez-Gomez et al. 2015).We then randomly select quiescent galaxies from snapshot 94 matching the mass distribution of the SAMI sample (i.e., following the solid red empty histogram in Fig. 4).This matching procedure is repeated 100 times to explore all realisations of the random sampling.We then obtain the mass distribution of their snapshot-58 progenitors.Of these, most (>95 per cent) have M > 10 10.5 M .Among all random realisations, on average only 33 per cent of progenitors are already quiescent at z = 0.73, in agreement with the results of Moster et al. (2020, cf. their fig. 16).We then sample the LEGA-C galaxies to match the mass distribution of the Illus-trisTNG 'progenitors' of the SAMI quiescent galaxies.This is done separately for star-forming and quiescent galaxies and considering only LEGA-C galaxies above the adopted quality cuts ( § 3.1).The mass distribution of the LEGA-C 'progenitors' of SAMI quiescent galaxies is shown in Fig. 4, with the dashed blue and red filled histogram representing the starforming and quiescent subsets.It is clear that our sample of progenitors has the wrong ratio between star-forming and quiescent galaxies: we select 170 and 410 galaxies from each class, respectively, whereas only 33 per cent of the simulated progenitors was already quiescent at z = 0.73.This discrepancy is dictated by the availability of LEGA-C galaxies.To correct for it, we apply weights in bins of M whenever calculating the median h4 of the progenitor sample, or the Pearson correlation coefficient for the evolution of h4 ( § 4).
A crucial caveat of the progenitor-matched sample is that selecting progenitors based solely on stellar mass does not completely remove progenitor bias.To prove this, we randomly select z = 0.73 IllustrisTNG galaxies with the same criterion used to select the LEGA-C progenitor-matched sample; i.e., we select galaxies from snapshot 54 following the same distribution of stellar mass as the progenitors of the z = 0 IllustrisTNG galaxies (which were in turn selected to match the SAMI stellar mass distribution).Looking at the descendants of these simulated galaxies, we find the following.Between snapshots 58 and 94, the quiescent progenitors remain mostly quiescent (16 per cent rejuvenate).Most (69 per cent) of the star-forming progenitors become quiescent.The fraction of progenitors which underwent at least one major merger between snapshots 58 and 94 is 24 per cent (mass ratio greater or equal to 1/3; using mass ratios of 1/2 and 2/3 the fractions are 17 and 12 per cent).

COSMIC EVOLUTION OF h4
In Fig. 5  For the progenitor-matched sample, the difference is even larger (cf.panels c-f), because the difference already reported at fixed M is amplified by the combination between the h4-M correlation, and the fact that progenitor-like galaxies have necessarily lower M than their descendants.Note that h4 differs even between star-forming and quiescent progenitors, in agreement with DE23.
We now consider the redshift evolution of h4.Fig. 6a shows h4 as a function of redshift for the mass-matched sample ( § 3.2).SAMI, MAGPI and LEGA-C galaxies are represented respectively by dark red triangles, red diamonds and pink pentagons.For each of the three surveys, the median uncertainties on h4 are represented by the errorbars with the same symbol and colour as the relevant survey.The three grey errorbars mark the median z and h4 of each of the three subsamples; the smallest errorbars are the uncertainties about the median, the largest errorbars are the 16 th -84 th percentiles of the h4 distribution.Considering the population of massive, quiescent galaxies, the median h4 rises from 0.019 ± 0.002 at z = 0.82 (LEGA-C), to 0.045 ± 0.008 at z = 0.31 (MAGPI) to 0.059 ± 0.004 at z = 0.06 (SAMI; the uncertainties have been estimated by bootstrapping each sample one thousand times).It is clear that the difference between SAMI and LEGA-C is statistically significant, being al- most nine standard deviations σ away from zero.For MAGPI, the difference from SAMI is not significant (one σ), but the difference from LEGA-C is marginally significant (three σ).
The median h4 for MAGPI is intermediate between SAMI and LEGA-C, which adds more confidence to the hypothesis that h4 increases with cosmic time.We do not model the intrinsic scatter of the distributions, but a simple estimate (by subtracting in quadrature the median uncertainty from the observed rms) gives an intrinsic scatter of 0.03 for SAMI and 0.05 for LEGA-C.
To establish if there is any evolution, we use the weighted Pearson correlation coefficient ρ between z and h4.For SAMI and MAGPI, all weights are set to one.For LEGA-C, they reflect the relative importance of galaxies in different bins ( § 3).Considering all three surveys, we find ρ = −0.29 (P = 1.2 × 10 −13 , seven-σ significance).Removing MAGPI, we get ρ = −0.30and a slightly higher significance (P = 4.2 × 10 −14 ).Further removing SAMI we find ρ = −0.07 and P = 0.12, which is not significant.In fact, taken separately, none of the three subsets gives a statistically significant correlation.Even though we focus on high-mass galaxies, there is nothing special about the M cut at 10 11 M : if we set the cut at 10 10.5 M (i.e.near the completeness limit of LEGA-C, van der Wel et al. 2021), we find ρ = −0.16 and P = 1.2 × 10 −6 , which is still statistically significant (4.7 σ).
So far, these results do not account for mass growth between the look-back times of LEGA-C and SAMI.As we will see, doing so entails including galaxies with M < 10 11 M from the LEGA-C sample, but these galaxies have even lower average h4 (DE23), which makes our result stronger.
The mass-matched analysis can only show the evolution of the quiescent population as a whole, without taking into account the effect of progenitor bias.But when we consider the progenitor-matched sample (Fig. 6b), the evidence for h4 increasing with redshift is even stronger.Here the dark red triangles are the same SAMI galaxies as in panel a, but blue/pink pentagons are star-forming/quiescent LEGA-C galaxies, chosen to match the mass distribution of the progenitors of SAMI galaxies (as inferred from numerical simulations, see § 3.3).The meaning of the errorbars is the same as in panel a.The median h4 of quiescent progenitors is 0.011 ± 0.003, and for star-forming progenitors is −0.010 ± 0.006; both values are more than nine σ away from the SAMI median value.These values are different (3-σ significance), in agreement with the difference in h4 between star-forming and quiescent galaxies reported in DE23.To combine the quiescent and star-forming progenitors, we upweight the latter, to account for the fact that two thirds of the simulated progenitors are star forming, but only onethird of LEGA-C progenitors are star forming.The weighted median is −0.006 ± 0.004 which is, as expected, lower than the value inferred from the mass-matched sample.Considering all galaxies in the progenitor-matched sample, we find ρ = −0.36,P = 3.5 × 10 −23 .Considering only SAMI and quiescent progenitors, we find ρ = −0.38,P = 2.6 × 10 −20 , while considering only SAMI and star-forming progenitors we have ρ = −0.59,P = 7.1 × 10 −30 .Overall, the progenitor-matched analysis suggests that when accounting for progenitor bias, the evolution of h4 is higher and more statistically significant.

Sample selection bias
Could systematic errors in the measurement of M explain the trend between h4 and redshift?Given the correlation be- Cosmic evolution of h 4 Figure 6.Cosmic evolution of the integrated h 4 moment for massive, quiescent galaxies (M > 10 11 M ).Panel a shows SAMI (dark red triangles) compared to MAGPI (red diamonds) and to the mass-matched sample from LEGA-C (pink pentagons); median measurement uncertainties are shown in the legend.The dashed horizontal line corresponds to a Gaussian line-of-sight velocity distribution.The grey errorbars encompass the 16 th -84 th percentiles of each sample and are located at the median redshift of the sample (with a small offset for readability).The black errorbars represent the uncertainty about the median value of each sample.The weighted Pearson correlation coefficient between all points is ρ = −0.29,with a significance of over 7 σ.In panel b, SAMI (dark red triangles) is compared to progenitorlike galaxies selected from LEGA-C, including both quiescent (pink pentagons) and star-forming galaxies (blue pentagons).Accounting for progenitor bias, the evolution in h 4 is even stronger (to calculate ρ, low-mass star-forming galaxies in LEGA-C are upweighted to match the mass distribution of the SAMI progenitors, see § 3.3).
tween h4 and M (DE23), and given the heterogeneous photometric data available to measure M , this is a reasonable concern.We adopt a conservative approach, by repeating the mass-matched analysis with a cut M > 10 10.5 M for SAMI only.This cut represents the extreme hypothesis of M being systematically underestimated by a factor of 3 at z = 0 compared to the redshift z = 0.8 of LEGA-C.As expected from the h4-M correlation, the average h4 for SAMI decreases, from 0.057 ± 0.002 to 0.037 ± 0.002, yet there is still a statistically significant trend between h4 and z (12 σ), thanks in part to the increased sample size (for SAMI, from 135 to 482).We do not consider the inverse possibility where M at z=0 is systematically overestimated, because in light of the positive h4-M correlation, increasing the cut in M at z = 0 would make the relation stronger and more significant.
To avoid completely the bias inherent to measuring stellar masses across a long cosmic interval, we can use alternatives: σap and dynamical masses.In Fig. 7a we show quiescent galaxies selected to have σap > 205 km s −1 , roughly corresponding to the stellar-mass selection M > 10 11 M3 .Considering all three surveys, the correlation between h4 and z has ρ = −0.35 and P = 1.3 × 10 −20 (9-σ significance).Similarly, in panel b, we re-select the sample based on Mvir > 10 11.5 M ; in this case, the correlation coefficient is ρ = −0.32,and the probability that h4 and z are uncorrelated is P = 8.3 × 10 −11 (6-σ significance).Similar results can be obtained by swapping Mvir with MJAM, except that the sample size is smaller (see § 2.4).

Measurement bias
Could the reported redshift evolution be due to measurement bias?As we have seen in § 2.3.2,our h4 measurements depend on the data homogenisation process.However, the sign of the offset in Fig. 2c is such that, had we neglected data homogenisation, we would have found an even larger h4 for SAMI galaxies, so an even stronger redshift evolution of h4.
Overall, we stress that even though the choice of template library affects the magnitude of the redshift evolution, we find a statistically significant correlation in every case, with the lowest statistical significance for the MILES stellar library (only three σ).The different spectral libraries give systematically different values of h4, which underscores the challenge of measuring h4 in absolute terms, and may complicate comparisons with numerical simulations.
Another possible source of bias is the use of fixed-size apertures rather than apertures matching the size of each galaxy.For LEGA-C, given the slit width of 1 arcsec and a galaxy half-light semi-major axis 0.3 < Re < 1.7 arcsec, the LEGA-C half slits span a median fraction of 0.8 Re, with the 5 th -95 th percentiles of the size distribution spanning between 1.7 to 0.3 Re.The covering fraction is however larger, because the slits span the full extent of the galaxies along the spatial direction.Moreover, for SAMI, we compare the measurements inside the reconstructed LEGA-C slit to the measurements inside one Re (Fig. 2a), finding only a small bias, so we conclude that aperture effects are not determining the redshift evolution of h4.
Finally, we note that LEGA-C observations consist of a large number of stacked exposures (up to 80, van der Wel et al. 2016), whereas SAMI and MAGPI rely on up to seven and twelve exposures, respectively.In principle, stacking could degrade the LOSVD, due to sub-pixel shifts in wavelength and changes to the atmospheric seeing.However, we find some evidence of redshift evolution even within LEGA-C alone, which is unlikely to be due to stacking.From a physical perspective, the reported increase in h4 with cosmic time is qualitatively consistent with the previously reported decrease in the rotation-to-dispersion ratio (V /σ, Newman et al. 2015;Toft et al. 2017;Newman et al. 2018;Bezanson et al. 2018).The latter is very unlikely to result from stacking -if anything, any degradation of the LOSVD due to stacking is more likely to decrease V (and increase σ, Cappellari et al. 2009), which would cause a spurious increase in V /σ with cosmic time, the opposite of what has been reported using LEGA-C data (Bezanson et al. 2018).

Redshift evolution: the role of environment
We find that the average h4 of massive (M > 10 11 M ) quiescent galaxies increases with cosmic time ( § 4).Looking at the median h4 of each survey in Fig. 6a, it appears that the evolution happened almost linearly in redshift space between z = 0.8 and z = 0.05.This is somewhat in tension with simulations, which report that most of the spin down of galaxies happens after z = 0.5 (Lagos et al. 2018).Moreover, by studying the shape distribution of galaxies, Zhang et al. (2022) also find that the fraction of massive, disc-like galaxies drops faster between z = 0.15 and z = 0.45 than between z = 0.45 and z = 0.75 (cf.their fig.6).Given the small sample size of the MAGPI dataset used in this work, it is unclear whether our findings are significant or not.Taking this result at face value, a possible explanation is provided by environmental effects.Environment is known to correlate with the kinematic structure of galaxies (e.g., Dressler et al. 1987;Cappellari et al. 2011;D'Eugenio et al. 2015) and MAGPI samples uniformly in halo mass (Foster et al. 2021), thereby introducing a bias toward high-density environments.As an example, slow rotators, which are intrinsically round and dispersionsupported galaxies, are more common in high-density environments than they are in the field.This is true both in the local Universe (Cappellari et al. 2013;van de Sande et al. 2021) as well as at the look-back time of LEGA-C (Cole et al. 2020).It is therefore possible that part of the evolution (and lack thereof) of h4 is due to environment effects which we do not take into account.However, even though we do find a significant correlation between h4 and local environment (for LEGA-C, we have P = 5 × 10 −4 , using the local overdensity δ from Sobral et al. 2022, measured as described in Darvish et al. 2014, Darvish et al. 2016and Darvish et al. 2017), this correlation is likely due to the mass-environment correlation.We can separate the environment and mass dependence of h4 using partial correlation coefficients (PCCs; see e.g., Bait et al. 2017;Bluck et al. 2019;Baker et al. 2022).The PCC ρ(x, y, |z) measures the correlation coefficient of the two random variables x and y while controlling for the third variable z.We find ρ(h4, M , |δ) = 0.25, with P = 2.3 × 10 −7 , while ρ(h4, δ, |M ) = 0.08, with P = 0.1.Another complication is due to the fact that MAGPI galaxies tend to be centrals, which may be even more likely to have different assembly histories than satellites.A larger sample may clarify whether environment plays any role.

Redshift evolution: beating progenitor bias
We have seen that the population of massive quiescent galaxies has on average higher h4 at z = 0.06 than it had at z = 0.8.Physically, this means that local quiescent galaxies have less rotation support and higher radial anisotropy than quiescent galaxies 7 Gyr ago (DE23).To disentangle radial anisotropy from rotation support, we follow the approach presented in DE23.If we consider only round galaxies (observed axis ratio q ≥ 0.8), we still find evidence for h4 evolution (though only 3.5 σ).Alternatively, we consider only galaxies with low rotation-to-dispersion ratio.Given the inhomogeneous nature of the (V /σ)e measurements for our data (DE23), we consider either galaxies with (V /σ)e < 0.5, or galaxies with (V /σ)e less than the 10 th percentile of each sample; we obtain statistically significant results in both cases (5 σ).This underscores that the observed differences in integrated h4 reflect not only differences in (V /σ)e (reported in Bezanson et al. 2018), but also differences in spatially resolved h4, which is a proxy for radial anisotropy.Taking our h4 measurements at face value, and assuming an isothermal potential (Cappellari et al. 2015;Poci et al. 2017;Derkenne et al. 2021), the increase from h4 ≈ 0.02 at z = 0.8 to h4 ≈ 0.06 at z = 0.06 means that the radial orbital anisotropy increases from β ≈ 0.1 to β ≈ 0.4 (see Gerhard 1993, their fig. 8). Theoretical expectations, (van der Marel & Franx 1993;Gerhard 1993), as well as empirical correlations with (V /σ)e and q (DE23), suggest that local quiescent galaxies must have rounder intrinsic shapes and lower rotation-to-dispersion ratios compared to higher-redshift quiescent galaxies.For this reason, the larger h4 in low-redshift galaxies is qualitatively consistent with both the observed differences in shape (van der Wel et al. 2011) and in rotation support (Bezanson et al. 2018).
The fact that, at any given redshift, star-forming galaxies have lower h4, larger (V /σ)e and flatter shapes than coeval quiescent galaxies means that the changes in the quiescent population cannot be explained by changing demographics.This is because newly quiescent galaxies, which come necessarily from the star-forming population, would need to dramatically alter their orbital structure just before becoming quiescent.Although in principle possible, this scenario seems unlikely, because -in the cosmic epochs we are studyingmost newly quiescent galaxies are expected to become quiescent without substantial structural change (Wu et al. 2018).When we attempt to account for progenitor bias (Fig. 6b), we find an even stronger redshift evolution, so the changes in the quiescent population over the last 7 Gyr occur despite the constant influx of newly quiescent galaxies, not because of it.This is very different from the evolution of e.g.galaxy size, where the light-weighted size of quiescent galaxies increases because of both demographics changes and physical processes in individual galaxies (e.g.van der Wel et al. 2014).Thus h4, and stellar kinematics more in general, enable us to circumvent progenitor bias.Even though we cannot yet say precisely by how much h4 increases for the average galaxy, we can firmly establish that individual quiescent galaxies evolve over cosmic time, even after becoming quiescent.

Physical interpretation and future outlook
Qualitatively, our results are consistent with the two-phase formation scenario: around cosmic noon, when gas accretion rates are large, massive galaxies form through dissipation, giving rise to intrinsically flat, rotation-supported systems (van der Wel et al. 2011;Newman et al. 2018;Bezanson et al. 2018), characterised by low h4, typical of discy, starforming galaxies.Over time, the accretion rate of cold gas slows down and is eventually surpassed by the accretion rate of gas-poor satellites; starved of fuel, star formation starts to decline until the galaxy becomes quiescent.The takeover of dry mergers means that newly added stars are distributed on radially-biased orbits, reflecting the infall orbit of their parent satellite.Collisionless evolution prevents the settling of the accreted mass on any pre-existing disc, while quiescence prevents the formation of a new disc.Moreover, the large number of low-mass satellites is critical to explain the round shape, large size and low rotation support of local massive quiescent galaxies (Bois et al. 2011;Naab et al. 2014, but note that these simulations did not include feedback from super-massive black holes).The two-phase formation scenario was invoked to explain the observed size growth of quiescent galaxies.For our mass-matched sample, we find indeed a median size increase from 4.4 to 7.6 kpc between LEGA-C and SAMI (Fig. 8a; the uncertainties on these median values are 0.02 dex).This, however, does not take into account progenitor bias.For the progenitor-matched sample (Fig. 8b), the median size of the quiescent progenitors is 3.8 kpc, the median size of the star-forming progenitors is 5.7 kpc.The probability that the samples match in their Re distributions are all PKS < 10 −5 .We cannot exclude secular processes as being partly responsible for the observed evolution in h4, but it is very unlikely that they are the only process.First, because these processes can only re-distribute angular momentum; the transition from high-angular-momentum, low-h4 galaxies to low-angular-momentum , high-h4 galaxies requires external processes.Moreover, internal processes alone cannot explain the size growth of quiescent galaxies.
We already know that -for stellar kinematics -the degree of rotation support decreases with cosmic time (Bezanson et al. 2018).Our results are consistent with their findings, but add that, in parallel with the decrease in rotation support, there is an increase in radial anisotropy, which is expected from minor dry mergers.Note that single major dry merger events may not necessarily increase the fraction of radial orbits (Bois et al. 2011).In contrast, numerous dry mergers can even out the asymmetry of individual satellite orbits, by averaging over a sufficient number of orbits.The meaning of the symbols is the same as Fig. 6.
Incidentally, our results also reconcile the observation that, for stars, the rotation-to-dispersion ratio decreases with cosmic time (Bezanson et al. 2018), but for star-forming gas it increases with cosmic time (e.g., Leroy et al. 2009;Law et al. 2009;Förster Schreiber et al. 2009;Wisnioski et al. 2015).These opposite trends raise the question of how is it possible that the dispersion of stars increases with time, but the dispersion of newly formed stars decreases with cosmic time.The answer is that, even though in-situ stars form on progressively thinner discs4 , the evolution of the quiescent population is driven by accretion of ex-situ stars on radially anisotropic orbits (Lagos et al. 2017).In the mass regime we are probing, these dry mergers overcome the demographics change due to the addition of newly quiescent galaxies that had thinner discs than the previous generations of longquiescent galaxies.
A clear prediction of the proposed scenario is that h4 must be linked to the fraction of ex-situ stars, which can be readily identified in numerical simulations, or, alternatively, could be estimated using a joint chemo-dynamical analysis (see e.g.Poci et al. 2019).Given that our measurements do not require spatially resolved spectroscopy, there is a lot we can learn from large single-fibre surveys such as the Sloan Digital Sky Survey (York et al. 2000).The current generation of large single-fibre surveys of the local Universe will give us access to even larger samples (e.g. the 4MOST Hemisphere Survey, Taylor et al., in prep.;the WEAVE-StePS, Costantin et al. 2019 and 4MOST-StePS surveys;and the DESI Bright Galaxy Survey, Ruiz-Macias et al. 2021), while future highredshift surveys will enable us to study h4 for galaxies at cosmic noon (MOONRISE survey, Maiolino et al. 2020).

Caveats and limitations
There are however some difficulties with our interpretation.First, the inferred evolution comes mostly from a single dataset: the LEGA-C survey; using only SAMI and MAGPI, we would not find evidence of evolution.Stacking data from higher redshift surveys (e.g., the VIRIAL survey, Mendel et al. 2015), or larger datasets from future surveys (e.g., MOONRISE, Maiolino et al. 2020), could address this shortcoming.
Second, the bulk of the evolution due to low-mass dry mergers is expected to occur before z = 1 (e.g.Remus et al. 2013Remus et al. , 2017;;Springel et al. 2018;Karademir et al. 2019).Evolution between z = 0.8 and z = 0.06 seems unlikely in this context (but see Oser et al. 2012;Peirani et al. 2019, for a different view).The fact that we find a strong correlation between h4 and stellar surface mass density, even when no correlation with M is found (Appendix A), suggests a link with the shape of the mass profile.Mass redistribution after feedback due to super-massive black holes, or after major dry mergers, could be responsible for reorganising the orbital structure of massive galaxies.
Finally, we note that, after a preliminary analysis from the Magneticum simulations (Hirschmann et al. 2014;Teklu et al. 2015), we find evidence for a slight increase of h4 over the redshift range considered here (Remus et al., in prep.).A possible explanation could be that h4 inferred from simulations and observations are not immediately comparable, partly because of our unique observing setup, partly because of the difficulty of measuring h4 in an absolute sense (DE23).

CONCLUSIONS
In this work, we studied the cosmic evolution of stellar kinematics in high-mass quiescent galaxies.We used the parameter h4, the coefficient of the 4 th -order Hermite polynomial in the Gauss-Hermite expansion of the line-of-sight velocity dis-tribution.These empirical measurements are dominated by systematic errors, which we minimise by homogenising the data.The combination of the SAMI, MAGPI and LEGA-C surveys enables us to leverage a long baseline in cosmic time, covering more than half the history of the Universe.
(ii) The reported redshift evolution is robust against progenitor bias: using a 'progenitor-matched' sample, which includes lower-mass quiescent and star-forming galaxies in the high-redshift bin, the inferred evolution of h4 becomes even stronger.
(iii) The reported evolution suggests an increase in the light-weighted radial anisotropy, which is consistent with the outcome of accretion of gas-poor satellites; however, other interpretations are also possible ( § 5.3.1).Future observations will be needed to independently check the reported evolution, while forward modelling from simulated galaxies may help compare the reported evolution with theoretical predictions.
λ [Å] Figure2.Effect of different measurement configurations on the reference value of h 4 for SAMI quiescent galaxies.The sand contours mark the distribution of SAMI quiescent galaxies with M ≥ 10 10.5 M (the contours enclose the 30 th , 50 th , and 90 th percentiles of the data); the black dots are the mass-matched sample (M ≥ 10 11 M ).The reference value is measured inside the elliptical aperture of semi-major axis equal to one Re, using the MILES SSP library and the full wavelength range of SAMI.Panel a shows the effect of using a slit instead of the elliptical aperture (including seeing convolution, see § 2.2); panel b shows the effect of restricting the wavelength range to the wavelength range of LEGA-C; and panel c combines both the slit setup and restricted wavelength range, thus matching our default measurements.This figure underscores the importance of our data homogenisation ( § 2.2).

Figure 3 .
Figure 3. Stellar mass distribution of the mass-matched sample.The three rows show the SAMI, MAGPI and LEGA-C sample.The dashed regions show the cut at M > 10 11 M , the filled grey histograms are the parent samples, consisting of all quiescent (Q) galaxies above the mass cut.The solid red empty histograms are the valid samples, consisting of all quiescent galaxies above the mass and quality selection cuts.The filled red histogram is the mass-matched sample; for SAMI and MAGPI, this coincides with the valid sample; for LEGA-C, it consists of a subset of the valid sample, chosen to reproduce the mass distribution of the SAMI valid sample.

Figure 4 .
Figure4.The 'progenitor-matched' sample consists of local massive quiescent galaxies (from SAMI, solid red empty histogram) and their possible z = 0.7 progenitors drawn from LEGA-C, including both star-forming galaxies (dashed blue histogram) and quiescent galaxies (red filled histogram).The mass distribution of the progenitors was derived from IllustrisTNG, but we note that, in the simulation, the fraction of quiescent progenitors is only 30 per cent.
we show the relation of h4 with M for the massmatched sample (panels a-c) and the progenitor-matched sample (panels d-f).Panel a shows the mass distribution of the three samples; by construction, the SAMI and LEGA-C samples have the same distribution.In panel b we show the relation between M and h4: there is little to no evidence for a correlation, at variance with what reported by DE23.This disagreement is due to the different M selection: we consider only M > 10 11 M , so we we have a shorter baseline in M compared to DE23.Despite matching in M , the SAMI and LEGA-C samples have different h4 distributions (panel c).

Figure 5 .
Figure 5. Relation between h 4 and M for the mass-matched sample (panels a-c) and the progenitor-matched sample (panels d-f).In panels b and e, the contour lines represent the 30 th , 50 th and 90 th percentiles of the data (MAGPI galaxies are represented individually as diamonds).Despite having the same M distribution (panel a), the SAMI and LEGA-C samples have different h 4 distributions (panel c).The difference in h 4 is even larger if we compare the SAMI sample to the progenitor-matched samples, where the different M distribution (panel d) amplify the difference in h 4 .

Figure 7 .
Figure7.The finding that h 4 changes with cosmic time is independent from our M selection: the trend is observed even if we select quiescent galaxies based on σap (panel a) or M vir (panel b).The meaning of the symbols is the same as Fig.6a.Note here we did not match the sample in their σap distribution (for panel a) or M vir distribution (for panel b).

Figure 8 .
Figure8.We find evidence for size evolution for both the mass-matched sample (panel a) and the progenitor-matched sample (panel b).The meaning of the symbols is the same as Fig.6.