Journal of Statistical Planning and Inference, 1997
It has been common practice to derive higher-order expansions with remainder terms Rn expressed u... more It has been common practice to derive higher-order expansions with remainder terms Rn expressed under O(·) or o(·) forms, then presenting simulations of small sample sizes for comparisons with the asymptotic results. Such comparisons are intended to support the idea that the unknown constants hidden behind the O(·) or o(·) remainders are reasonably small. That practice however is marred by
Although the assumption of elliptical symmetry is quite common in multivariate analysis and wides... more Although the assumption of elliptical symmetry is quite common in multivariate analysis and widespread in a number of applications, the problem of testing the null hypothesis of ellipticity so far has not been addressed in a fully satisfactory way. Most papers in the literature indeed are dealing with the null hypothesis of elliptical symmetry with specified location and actually address location rather than non-elliptical alternatives. In this paper, we are proposing new classes of testing procedures, both for specified and unspecified location. The backbone of our construction is Le Cam's asymptotic theory of statistical experiments, and optimality is to be understood locally and asymptotically within the family of generalized skew-elliptical distributions. The tests we are proposing are enjoying all the desirable properties of a "good" test of elliptical symmetry: they have simple asymptotic distributions under the entire null hypothesis of elliptical symmetry with unspecified radial density and shape parameter; they are affine-invariant, computationally fast, intuitively understandable, and not too demanding in terms of moments. While achieving optimality against generalized skew-elliptical alternatives, they remain quite powerful under a much broader class of non-elliptical distributions and significantly outperform the available competitors.
Based on the novel concept of multivariate center-outward quantiles introduced recently in Cherno... more Based on the novel concept of multivariate center-outward quantiles introduced recently in Chernozhukov et al. (2017) and Hallin et al. (2021), we are considering the problem of nonparametric multiple-output quantile regression. Our approach defines nested conditional center-outward quantile regression contours and regions with given conditional probability content irrespective of the underlying distribution; their graphs constitute nested center-outward quantile regression tubes. Empirical counterparts of these concepts are constructed, yielding interpretable empirical regions and contours which are shown to consistently reconstruct their population versions in the Pompeiu-Hausdorff topology. Our method is entirely non-parametric and performs well in simulations including heteroskedasticity and nonlinear trends; its power as a data-analytic tool is illustrated on some real datasets.
We develop a class of tests for semiparametric vector autoregressive (VAR) models with unspecifie... more We develop a class of tests for semiparametric vector autoregressive (VAR) models with unspecified innovation densities based on the recent measure-transportation-based concepts of multivariate center-outward ranks and signs. We show that these concepts, combined with Le Cam's asymptotic theory of statistical experiments, yield novel testing procedures, which (a) are valid under a broad class of innovation densities (possibly non-elliptical, skewed, and/or with infinite moments), (b) are optimal (locally asymptotically maximin or most stringent) at selected ones, and (c) are robust against additive outliers. In order to do so, we establish, for a general class of center-outward rankbased serial statistics, a Hájek asymptotic representation result, of independent interest, which allows for a rank-based reconstruction of central sequences. As an illustration, we consider the problems of testing the absence of serial correlation in multiple-output and possibly non-linear regression (an extension of the classical Durbin-Watson problem) and the sequential identification of the order p of a VAR(p) model. A Monte Carlo comparative study of our tests and their routinely-applied Gaussian competitors demonstrates the benefits (in terms of size, power, and robustness) of our methodology; these benefits are particularly significant in the presence of asymmetric and leptokurtic innovation densities. A real-data application concludes the paper.
Rank correlations have found many innovative applications in the last decade. In particular, suit... more Rank correlations have found many innovative applications in the last decade. In particular, suitable rank correlations have been used for consistent tests of independence between pairs of random variables. Using ranks is especially appealing for continuous data as tests become distribution-free. However, the traditional concept of ranks relies on ordering data and is, thus, tied to univariate observations. As a result, it has long remained unclear how one may construct distribution-free yet consistent tests of independence between random vectors. This is the problem addressed in this paper, in which we lay out a general framework for designing dependence measures that give tests of multivariate independence that are not only consistent and distribution-free but which we also prove to be statistically efficient. Our framework leverages the recently introduced concept of center-outward ranks and signs, a multivariate generalization of traditional ranks, and adopts a common standard form for dependence measures that encompasses many popular examples. In a unified study, we derive a general asymptotic representation of center-outward rank-based test statistics under independence, extending to the multivariate setting the classical Hájek asymptotic representation results. This representation permits direct calculation of limiting null distributions and facilitates a local power analysis that provides strong support for the center-outward approach by establishing, for the first time, the nontrivial power of center-outward rank-based tests over root-n neighborhoods within the class of quadratic mean differentiable alternatives.
Defining multivariate generalizations of the classical univariate ranks has been a long-standing ... more Defining multivariate generalizations of the classical univariate ranks has been a long-standing open problem in statistics. Optimal transport has been shown to offer a solution in which multivariate ranks are obtained by transporting data points to a grid that approximates a uniform reference measure (Chernozhukov et al., 2017; Hallin, 2017; Hallin et al., 2021). We take up this new perspective to develop and study multivariate analogues of the sign covariance/quadrant statistic, Kendall's tau, and Spearman's rho. The resulting tests of multivariate independence are genuinely distribution-free, hence uniformly valid irrespective of the actual (absolutely continuous) distributions of the observations. Our results provide asymptotic distribution theory for these new test statistics, with asymptotic approximations to critical values to be used for testing independence as well as a power analysis of the resulting tests. This includes a multivariate elliptical Chernoff-Savage property, which guarantees that, under ellipticity, our nonparametric tests of independence enjoy an asymptotic relative efficiency of one or larger with respect to the classical Gaussian procedures.
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific ... more HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
This paper introduces rank-based tests for the cointegrating rank in an Error Correction Model wi... more This paper introduces rank-based tests for the cointegrating rank in an Error Correction Model with i.i.d. elliptical innovations. The tests are asymptotically distribution-free, and their validity does not depend on the actual distribution of the innovations. This result holds despite the fact that, depending on the alternatives considered, the model exhibits a non-standard Locally Asymptotically Brownian Functional (LABF) and Locally Asymptotically Mixed Normal (LAMN) local structure—a structure which we completely characterize. Our tests, which have the general form of Lagrange multiplier tests, depend on a reference density that can freely be chosen, and thus is not restricted to be Gaussian as in traditional quasi-likelihood procedures. Moreover, appropriate choices of the reference density are achieving the semiparametric efficiency bounds. Simulations show that our asymptotic analysis provides an accurate approximation to finite-sample behavior. Our results are based on an ex...
Univariate concepts as quantile and distribution functions involving ranks and signs, do not cano... more Univariate concepts as quantile and distribution functions involving ranks and signs, do not canonically extend to $\mathbb{R}^d, d\geq 2$. Palliating that has generated an abundant literature. Chapter 1 shows that, unlike the many definitions that have been proposed so far, the measure transportation-based ones introduced in Chernozhukov et al. (2017) enjoy all the properties that make univariate quantiles and ranks successful tools for semiparametric statistical inference. We therefore propose a new center-outward definition of multivariate distribution and quantile functions, along with their empirical counterparts, for which we obtain a Glivenko-Cantelli result. Our approach is geometric and, contrary to the Monge-Kantorovich one in Chernozhukov et al. (2017), does not require any moment assumptions. The resulting ranks and signs are strictly distribution-free, and maximal invariant under the action of a data-driven class of (order-preserving) transformations generating the fami...
Journal of the American Statistical Association, 2022
Extending rank-based inference to a multivariate setting such as multiple-output regression or MA... more Extending rank-based inference to a multivariate setting such as multiple-output regression or MANOVA with unspecified d-dimensional error density has remained an open problem for more than half a century. None of the many solutions proposed so far is enjoying the combination of distribution-freeness and efficiency that makes rank-based inference a successful tool in the univariate setting. A concept of center-outward multivariate ranks and signs based on measure transportation ideas has been introduced recently. Center-outward ranks and signs are not only distribution-free but achieve in dimension d > 1 the (essential) maximal ancillarity property of traditional univariate ranks, hence carry all the “distribution-free information" available in the sample. We derive here the Hajek representation and asymptotic normality results required in the construction of center-outward rank tests for multiple-output regression and MANOVA. When based on appropriate spherical scores, thes...
Locally asymptotically optimal (in the Hajek-Le Cam sense) pseudo-Gaussian and rank-based procedu... more Locally asymptotically optimal (in the Hajek-Le Cam sense) pseudo-Gaussian and rank-based procedures for detecting randomness of coefficients in linear regression models are proposed. The parametric and semiparametric efficiency properties of those procedures are investigated. Their asymptotic relative efficiencies (with respect to the pseudo-Gaussian procedure) turns out to be be huge under heavy-tailed and skewed densities, stressing the importance of an adequate choice of scores. Simulations demonstrate the excellent finite-sample performances of a class of rank-based procedures based on data-driven scores.
Rank correlations have found many innovative applications in the last decade. In particular,suita... more Rank correlations have found many innovative applications in the last decade. In particular,suitable versions of rank correlations have been used for consistent tests of independence between pairs of random variables. The use of ranks is especially appealing for continuous data as tests become distribution-free. However, the traditional concept of ranks relies on ordering data and is, thus, tied to univariate observations. As a result it has long remained unclear how one may construct distribution-free yet consistent tests of independence between multivariate random vectors. This is the problem we address in this paper, in which we lay out a general framework for designing dependence measures that give tests of multivariate independence that are not only consistent and distribution-free but which we also prove to be statistically efficient. Our framework leverages the recently introduced concept of center-outward ranks and signs, a multivariate generalization of traditional ranks, a...
Journal of the American Statistical Association, 2020
We propose a new class of R-estimators for semiparametric VARMA models in which the innovation de... more We propose a new class of R-estimators for semiparametric VARMA models in which the innovation density plays the role of the nuisance parameter. Our estimators are based on the novel concepts of multivariate center-outward ranks and signs. We show that these concepts, combined with Le Cam's asymptotic theory of statistical experiments, yield a class of semiparametric estimation procedures, which are efficient (at a given reference density), root-n consistent, and asymptotically normal under a broad class of (possibly non elliptical) actual innovation densities. No kernel density estimation is required to implement our procedures. A Monte Carlo comparative study of our R-estimators and other routinely-applied competitors demonstrates the benefits of the novel methodology, in large and small sample. Proofs, computational aspects, and further numerical results are available in the supplementary material.
Annual Review of Statistics and Its Application, 2021
Unlike the real line, the real space, in dimension d ≥ 2, is not canonically ordered. As a conseq... more Unlike the real line, the real space, in dimension d ≥ 2, is not canonically ordered. As a consequence, extending to a multivariate context fundamental univariate statistical tools such as quantiles, signs, and ranks is anything but obvious. Tentative definitions have been proposed in the literature but do not enjoy the basic properties (e.g., distribution-freeness of ranks, their independence with respect to the order statistic, their independence with respect to signs) they are expected to satisfy. Based on measure transportation ideas, new concepts of distribution and quantile functions, ranks, and signs have been proposed recently that, unlike previous attempts, do satisfy these properties. These ranks, signs, and quantiles have been used, quite successfully, in several inference problems and have triggered, in a short span of time, a number of applications: fully distribution-free testing for multiple-output regression, MANOVA, and VAR models; R-estimation for VARMA parameters;...
Unlike the real line, the real space R d , for d ≥ 2, is not canonically ordered. As a consequenc... more Unlike the real line, the real space R d , for d ≥ 2, is not canonically ordered. As a consequence, such fundamental univariate concepts as quantile and distribution functions, and their empirical counterparts, involving ranks and signs, do not canonically extend to the multivariate context. Palliating that lack of a canonical ordering has been an open problem for more than half a century, generating an abundant literature and motivating, among others, the development of statistical depth and copulabased methods. We show that, unlike the many definitions proposed in the literature, the measure transportation-based ranks introduced in Chernozhukov et al. (2017) enjoy all the properties that make univariate ranks a successful tool for semiparametric inference. Related with those ranks, we propose a new center-outward definition of multivariate distribution and quantile functions, along with their empirical counterparts, for which we establish a Glivenko-Cantelli result. Our approach is based on McCann (1995) and our results, unlike those of Chernozhukov et al. (2017), do not require any moment assumptions. The resulting ranks and signs are shown to be strictly distribution-free and essentially maximal ancillary in the sense of Basu (1959) which, in semiparametric models involving noise with unspecified density, can be interpreted as a finite-sample form of semiparametric efficiency. Although constituting a sufficient summary of the sample, empirical center-outward distribution functions are defined at observed values only. A continuous extension to the entire d-dimensional space, yielding smooth empirical quantile contours and sign curves while preserving the essential monotonicity and Glivenko-Cantelli features, is provided. A numerical study of the resulting empirical quantile contours is conducted.
Random coefficient regression (RCR) models are the regression versions of random effects models i... more Random coefficient regression (RCR) models are the regression versions of random effects models in analysis of variance and panel data analysis. Optimal detection of the presence of random coefficients (equivalently, optimal testing of the hypothesis of constant regression coefficients) has been an open problem for many years. The simple regression case has been solved recently and the multiple regression case is considered here. The latter poses several theoretical challenges: (a) a nonstandard ULAN structure, with log-likelihood gradients vanishing at the null; (b) cone-shaped alternatives under which traditional optimality concepts are no longer adequate; (c) nuisance parameters that are not identified under the null but have a significant impact on local powers. We propose a new (local and asymptotic) concept of optimality for this problem and, for specified error densities, derive parametrically optimal procedures. A suitable modification of the Gaussian version of the latter is shown to qualify as a pseudo-Gaussian test. The asymptotic performances of those pseudo-Gaussian tests, however, are quite poor under skewed and heavy-tailed densities. We therefore also construct rank-based tests, possibly based on data-driven scores, the asymptotic relative efficiencies of which are remarkably high with respect to their pseudo-Gaussian counterparts.
An asymptotic distribution theory is developed for a general class of signed-rank serial statisti... more An asymptotic distribution theory is developed for a general class of signed-rank serial statistics, and is then used to derive asymptotically locally optimal tests (in the maximin sense) for testing an ARMA model against other ARMA models. Special cases yield Fisher-Yates, van der Waerden, and Wilcoxon type tests. The asymptotic relative efficiencies of the proposed procedures with respect to each other, and with respect to their normal theory counterparts, are provided. Cl 1991 Academic Press. Inc.
Random coefficient regression models are the regression counterparts of the classical random effe... more Random coefficient regression models are the regression counterparts of the classical random effects models in Analysis of Variance and panel data analysis. While several heuristic methods have been proposed for the detection of such random regression coefficients, little is known on their optimality properties. Based on a nonstandard ULAN property, we are proposing locally asymptotically optimal (in the Hájek-Le Cam sense) parametric, pseudo-Gaussian, and rankbased procedures for this problem. The asymptotic relative efficiencies (with respect to the pseudo-Gaussian procedure) of rank-based tests turn out to be quite high under heavy-tailed and skewed densities, demonstrating the importance of a careful choice of scores. Simulations reveal the excellent finite-sample performances of a class of rank-based procedures based on data-driven scores.
Journal of Statistical Planning and Inference, 1997
It has been common practice to derive higher-order expansions with remainder terms Rn expressed u... more It has been common practice to derive higher-order expansions with remainder terms Rn expressed under O(·) or o(·) forms, then presenting simulations of small sample sizes for comparisons with the asymptotic results. Such comparisons are intended to support the idea that the unknown constants hidden behind the O(·) or o(·) remainders are reasonably small. That practice however is marred by
Although the assumption of elliptical symmetry is quite common in multivariate analysis and wides... more Although the assumption of elliptical symmetry is quite common in multivariate analysis and widespread in a number of applications, the problem of testing the null hypothesis of ellipticity so far has not been addressed in a fully satisfactory way. Most papers in the literature indeed are dealing with the null hypothesis of elliptical symmetry with specified location and actually address location rather than non-elliptical alternatives. In this paper, we are proposing new classes of testing procedures, both for specified and unspecified location. The backbone of our construction is Le Cam's asymptotic theory of statistical experiments, and optimality is to be understood locally and asymptotically within the family of generalized skew-elliptical distributions. The tests we are proposing are enjoying all the desirable properties of a "good" test of elliptical symmetry: they have simple asymptotic distributions under the entire null hypothesis of elliptical symmetry with unspecified radial density and shape parameter; they are affine-invariant, computationally fast, intuitively understandable, and not too demanding in terms of moments. While achieving optimality against generalized skew-elliptical alternatives, they remain quite powerful under a much broader class of non-elliptical distributions and significantly outperform the available competitors.
Based on the novel concept of multivariate center-outward quantiles introduced recently in Cherno... more Based on the novel concept of multivariate center-outward quantiles introduced recently in Chernozhukov et al. (2017) and Hallin et al. (2021), we are considering the problem of nonparametric multiple-output quantile regression. Our approach defines nested conditional center-outward quantile regression contours and regions with given conditional probability content irrespective of the underlying distribution; their graphs constitute nested center-outward quantile regression tubes. Empirical counterparts of these concepts are constructed, yielding interpretable empirical regions and contours which are shown to consistently reconstruct their population versions in the Pompeiu-Hausdorff topology. Our method is entirely non-parametric and performs well in simulations including heteroskedasticity and nonlinear trends; its power as a data-analytic tool is illustrated on some real datasets.
We develop a class of tests for semiparametric vector autoregressive (VAR) models with unspecifie... more We develop a class of tests for semiparametric vector autoregressive (VAR) models with unspecified innovation densities based on the recent measure-transportation-based concepts of multivariate center-outward ranks and signs. We show that these concepts, combined with Le Cam's asymptotic theory of statistical experiments, yield novel testing procedures, which (a) are valid under a broad class of innovation densities (possibly non-elliptical, skewed, and/or with infinite moments), (b) are optimal (locally asymptotically maximin or most stringent) at selected ones, and (c) are robust against additive outliers. In order to do so, we establish, for a general class of center-outward rankbased serial statistics, a Hájek asymptotic representation result, of independent interest, which allows for a rank-based reconstruction of central sequences. As an illustration, we consider the problems of testing the absence of serial correlation in multiple-output and possibly non-linear regression (an extension of the classical Durbin-Watson problem) and the sequential identification of the order p of a VAR(p) model. A Monte Carlo comparative study of our tests and their routinely-applied Gaussian competitors demonstrates the benefits (in terms of size, power, and robustness) of our methodology; these benefits are particularly significant in the presence of asymmetric and leptokurtic innovation densities. A real-data application concludes the paper.
Rank correlations have found many innovative applications in the last decade. In particular, suit... more Rank correlations have found many innovative applications in the last decade. In particular, suitable rank correlations have been used for consistent tests of independence between pairs of random variables. Using ranks is especially appealing for continuous data as tests become distribution-free. However, the traditional concept of ranks relies on ordering data and is, thus, tied to univariate observations. As a result, it has long remained unclear how one may construct distribution-free yet consistent tests of independence between random vectors. This is the problem addressed in this paper, in which we lay out a general framework for designing dependence measures that give tests of multivariate independence that are not only consistent and distribution-free but which we also prove to be statistically efficient. Our framework leverages the recently introduced concept of center-outward ranks and signs, a multivariate generalization of traditional ranks, and adopts a common standard form for dependence measures that encompasses many popular examples. In a unified study, we derive a general asymptotic representation of center-outward rank-based test statistics under independence, extending to the multivariate setting the classical Hájek asymptotic representation results. This representation permits direct calculation of limiting null distributions and facilitates a local power analysis that provides strong support for the center-outward approach by establishing, for the first time, the nontrivial power of center-outward rank-based tests over root-n neighborhoods within the class of quadratic mean differentiable alternatives.
Defining multivariate generalizations of the classical univariate ranks has been a long-standing ... more Defining multivariate generalizations of the classical univariate ranks has been a long-standing open problem in statistics. Optimal transport has been shown to offer a solution in which multivariate ranks are obtained by transporting data points to a grid that approximates a uniform reference measure (Chernozhukov et al., 2017; Hallin, 2017; Hallin et al., 2021). We take up this new perspective to develop and study multivariate analogues of the sign covariance/quadrant statistic, Kendall's tau, and Spearman's rho. The resulting tests of multivariate independence are genuinely distribution-free, hence uniformly valid irrespective of the actual (absolutely continuous) distributions of the observations. Our results provide asymptotic distribution theory for these new test statistics, with asymptotic approximations to critical values to be used for testing independence as well as a power analysis of the resulting tests. This includes a multivariate elliptical Chernoff-Savage property, which guarantees that, under ellipticity, our nonparametric tests of independence enjoy an asymptotic relative efficiency of one or larger with respect to the classical Gaussian procedures.
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific ... more HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
This paper introduces rank-based tests for the cointegrating rank in an Error Correction Model wi... more This paper introduces rank-based tests for the cointegrating rank in an Error Correction Model with i.i.d. elliptical innovations. The tests are asymptotically distribution-free, and their validity does not depend on the actual distribution of the innovations. This result holds despite the fact that, depending on the alternatives considered, the model exhibits a non-standard Locally Asymptotically Brownian Functional (LABF) and Locally Asymptotically Mixed Normal (LAMN) local structure—a structure which we completely characterize. Our tests, which have the general form of Lagrange multiplier tests, depend on a reference density that can freely be chosen, and thus is not restricted to be Gaussian as in traditional quasi-likelihood procedures. Moreover, appropriate choices of the reference density are achieving the semiparametric efficiency bounds. Simulations show that our asymptotic analysis provides an accurate approximation to finite-sample behavior. Our results are based on an ex...
Univariate concepts as quantile and distribution functions involving ranks and signs, do not cano... more Univariate concepts as quantile and distribution functions involving ranks and signs, do not canonically extend to $\mathbb{R}^d, d\geq 2$. Palliating that has generated an abundant literature. Chapter 1 shows that, unlike the many definitions that have been proposed so far, the measure transportation-based ones introduced in Chernozhukov et al. (2017) enjoy all the properties that make univariate quantiles and ranks successful tools for semiparametric statistical inference. We therefore propose a new center-outward definition of multivariate distribution and quantile functions, along with their empirical counterparts, for which we obtain a Glivenko-Cantelli result. Our approach is geometric and, contrary to the Monge-Kantorovich one in Chernozhukov et al. (2017), does not require any moment assumptions. The resulting ranks and signs are strictly distribution-free, and maximal invariant under the action of a data-driven class of (order-preserving) transformations generating the fami...
Journal of the American Statistical Association, 2022
Extending rank-based inference to a multivariate setting such as multiple-output regression or MA... more Extending rank-based inference to a multivariate setting such as multiple-output regression or MANOVA with unspecified d-dimensional error density has remained an open problem for more than half a century. None of the many solutions proposed so far is enjoying the combination of distribution-freeness and efficiency that makes rank-based inference a successful tool in the univariate setting. A concept of center-outward multivariate ranks and signs based on measure transportation ideas has been introduced recently. Center-outward ranks and signs are not only distribution-free but achieve in dimension d > 1 the (essential) maximal ancillarity property of traditional univariate ranks, hence carry all the “distribution-free information" available in the sample. We derive here the Hajek representation and asymptotic normality results required in the construction of center-outward rank tests for multiple-output regression and MANOVA. When based on appropriate spherical scores, thes...
Locally asymptotically optimal (in the Hajek-Le Cam sense) pseudo-Gaussian and rank-based procedu... more Locally asymptotically optimal (in the Hajek-Le Cam sense) pseudo-Gaussian and rank-based procedures for detecting randomness of coefficients in linear regression models are proposed. The parametric and semiparametric efficiency properties of those procedures are investigated. Their asymptotic relative efficiencies (with respect to the pseudo-Gaussian procedure) turns out to be be huge under heavy-tailed and skewed densities, stressing the importance of an adequate choice of scores. Simulations demonstrate the excellent finite-sample performances of a class of rank-based procedures based on data-driven scores.
Rank correlations have found many innovative applications in the last decade. In particular,suita... more Rank correlations have found many innovative applications in the last decade. In particular,suitable versions of rank correlations have been used for consistent tests of independence between pairs of random variables. The use of ranks is especially appealing for continuous data as tests become distribution-free. However, the traditional concept of ranks relies on ordering data and is, thus, tied to univariate observations. As a result it has long remained unclear how one may construct distribution-free yet consistent tests of independence between multivariate random vectors. This is the problem we address in this paper, in which we lay out a general framework for designing dependence measures that give tests of multivariate independence that are not only consistent and distribution-free but which we also prove to be statistically efficient. Our framework leverages the recently introduced concept of center-outward ranks and signs, a multivariate generalization of traditional ranks, a...
Journal of the American Statistical Association, 2020
We propose a new class of R-estimators for semiparametric VARMA models in which the innovation de... more We propose a new class of R-estimators for semiparametric VARMA models in which the innovation density plays the role of the nuisance parameter. Our estimators are based on the novel concepts of multivariate center-outward ranks and signs. We show that these concepts, combined with Le Cam's asymptotic theory of statistical experiments, yield a class of semiparametric estimation procedures, which are efficient (at a given reference density), root-n consistent, and asymptotically normal under a broad class of (possibly non elliptical) actual innovation densities. No kernel density estimation is required to implement our procedures. A Monte Carlo comparative study of our R-estimators and other routinely-applied competitors demonstrates the benefits of the novel methodology, in large and small sample. Proofs, computational aspects, and further numerical results are available in the supplementary material.
Annual Review of Statistics and Its Application, 2021
Unlike the real line, the real space, in dimension d ≥ 2, is not canonically ordered. As a conseq... more Unlike the real line, the real space, in dimension d ≥ 2, is not canonically ordered. As a consequence, extending to a multivariate context fundamental univariate statistical tools such as quantiles, signs, and ranks is anything but obvious. Tentative definitions have been proposed in the literature but do not enjoy the basic properties (e.g., distribution-freeness of ranks, their independence with respect to the order statistic, their independence with respect to signs) they are expected to satisfy. Based on measure transportation ideas, new concepts of distribution and quantile functions, ranks, and signs have been proposed recently that, unlike previous attempts, do satisfy these properties. These ranks, signs, and quantiles have been used, quite successfully, in several inference problems and have triggered, in a short span of time, a number of applications: fully distribution-free testing for multiple-output regression, MANOVA, and VAR models; R-estimation for VARMA parameters;...
Unlike the real line, the real space R d , for d ≥ 2, is not canonically ordered. As a consequenc... more Unlike the real line, the real space R d , for d ≥ 2, is not canonically ordered. As a consequence, such fundamental univariate concepts as quantile and distribution functions, and their empirical counterparts, involving ranks and signs, do not canonically extend to the multivariate context. Palliating that lack of a canonical ordering has been an open problem for more than half a century, generating an abundant literature and motivating, among others, the development of statistical depth and copulabased methods. We show that, unlike the many definitions proposed in the literature, the measure transportation-based ranks introduced in Chernozhukov et al. (2017) enjoy all the properties that make univariate ranks a successful tool for semiparametric inference. Related with those ranks, we propose a new center-outward definition of multivariate distribution and quantile functions, along with their empirical counterparts, for which we establish a Glivenko-Cantelli result. Our approach is based on McCann (1995) and our results, unlike those of Chernozhukov et al. (2017), do not require any moment assumptions. The resulting ranks and signs are shown to be strictly distribution-free and essentially maximal ancillary in the sense of Basu (1959) which, in semiparametric models involving noise with unspecified density, can be interpreted as a finite-sample form of semiparametric efficiency. Although constituting a sufficient summary of the sample, empirical center-outward distribution functions are defined at observed values only. A continuous extension to the entire d-dimensional space, yielding smooth empirical quantile contours and sign curves while preserving the essential monotonicity and Glivenko-Cantelli features, is provided. A numerical study of the resulting empirical quantile contours is conducted.
Random coefficient regression (RCR) models are the regression versions of random effects models i... more Random coefficient regression (RCR) models are the regression versions of random effects models in analysis of variance and panel data analysis. Optimal detection of the presence of random coefficients (equivalently, optimal testing of the hypothesis of constant regression coefficients) has been an open problem for many years. The simple regression case has been solved recently and the multiple regression case is considered here. The latter poses several theoretical challenges: (a) a nonstandard ULAN structure, with log-likelihood gradients vanishing at the null; (b) cone-shaped alternatives under which traditional optimality concepts are no longer adequate; (c) nuisance parameters that are not identified under the null but have a significant impact on local powers. We propose a new (local and asymptotic) concept of optimality for this problem and, for specified error densities, derive parametrically optimal procedures. A suitable modification of the Gaussian version of the latter is shown to qualify as a pseudo-Gaussian test. The asymptotic performances of those pseudo-Gaussian tests, however, are quite poor under skewed and heavy-tailed densities. We therefore also construct rank-based tests, possibly based on data-driven scores, the asymptotic relative efficiencies of which are remarkably high with respect to their pseudo-Gaussian counterparts.
An asymptotic distribution theory is developed for a general class of signed-rank serial statisti... more An asymptotic distribution theory is developed for a general class of signed-rank serial statistics, and is then used to derive asymptotically locally optimal tests (in the maximin sense) for testing an ARMA model against other ARMA models. Special cases yield Fisher-Yates, van der Waerden, and Wilcoxon type tests. The asymptotic relative efficiencies of the proposed procedures with respect to each other, and with respect to their normal theory counterparts, are provided. Cl 1991 Academic Press. Inc.
Random coefficient regression models are the regression counterparts of the classical random effe... more Random coefficient regression models are the regression counterparts of the classical random effects models in Analysis of Variance and panel data analysis. While several heuristic methods have been proposed for the detection of such random regression coefficients, little is known on their optimality properties. Based on a nonstandard ULAN property, we are proposing locally asymptotically optimal (in the Hájek-Le Cam sense) parametric, pseudo-Gaussian, and rankbased procedures for this problem. The asymptotic relative efficiencies (with respect to the pseudo-Gaussian procedure) of rank-based tests turn out to be quite high under heavy-tailed and skewed densities, demonstrating the importance of a careful choice of scores. Simulations reveal the excellent finite-sample performances of a class of rank-based procedures based on data-driven scores.
Uploads
Papers by Marc Hallin