Journal of the American Statistical Association, 1995
Building a minimal spanning tree. Library of Congress Cataloging-in-Publication Data Good, Philli... more Building a minimal spanning tree. Library of Congress Cataloging-in-Publication Data Good, Phillip I. Permutation tests: a practical guide to resampling inethods for testing hypothesesjPhillip Good. p. cm.-(Springer series in statistics) Inc1udes bibliographical references and index. 1. Statistical hypothesis testing. 2. Resampling (Statistics) I. Title. 11. Series.
Brazilian Journal of Probability and Statistics, 2020
We consider multiple imputation (MI) for unbalanced ranked set samples (URSS) by considering them... more We consider multiple imputation (MI) for unbalanced ranked set samples (URSS) by considering them as data sets with missing values. We replace each missing value with a set of plausible values drawn from a predictive distribution that represents the uncertainty about the appropriate value to impute. Using the structure of the MI dataset, we develop algorithms that imitate the structure of URSS to carry out the desired statistical inference. We provide results for the convergence of the empirical distribution functions of imputed samples to the population distribution function, under both URSS and simple random sampling (SRS). We obtain the variances of the imputed URSS, and the expected values of the variance estimators. We also study the problem of quantile estimation using an imputed URSS and propose a hybrid method based on the bootstrap and imputation of URSS data. We apply our results to estimate the mean and quantiles of the mercury in contaminated fish under perfect and imper...
Journal of Statistical Computation and Simulation, 2017
Nonparametric estimation and inferences of conditional distribution functions with longitudinal d... more Nonparametric estimation and inferences of conditional distribution functions with longitudinal data have important applications in biomedical studies, such as epidemiological studies and longitudinal clinical trials. Estimation approaches without any structural assumptions may lead to inadequate and numerically unstable estimators in practice. We propose in this paper a nonparametric approach based on time-varying parametric models for estimating the conditional distribution functions with a longitudinal sample. Our model assumes that the conditional distribution of the outcome variable at each given time point can be approximated by a parametric model after local Box-Cox transformation. Our estimation is based on a two-step smoothing method, in which we first obtain the raw estimators of the conditional distribution functions at a set of disjoint time points, and then compute the final estimators at any time by smoothing the raw estimators. Applications of our two-step estimation method have been demonstrated through a large epidemiological study of childhood growth and blood pressure. Finite sample properties of our procedures are investigated through a simulation study. Application and simulation results show that smoothing estimation from time-variant parametric models outperforms the existing kernel smoothing estimator by producing narrower pointwise bootstrap confidence band and smaller root mean squared error.
Journal of The Korean Statistical Society, Jun 1, 2016
We study nonparametric estimation of the distribution function (DF) of a continuous random variab... more We study nonparametric estimation of the distribution function (DF) of a continuous random variable based on a ranked set sampling design using the exponentially tilted (ET) empirical likelihood method. We propose ET estimators of the DF and use them to construct new resampling algorithms for unbalanced ranked set samples. We explore the properties of the proposed algorithms. For a hypothesis testing problem about the underlying population mean, we show that the bootstrap tests based on the ET estimators of the DF are asymptotically normal and exhibit a small bias of order O(n −1). We illustrate the methods and evaluate the finite sample performance of the algorithms under both perfect and imperfect ranking schemes using a real data set and several Monte Carlo simulation studies. We compare the performance of the test statistics based on the ET estimators with those based on the empirical likelihood estimators.
We present novel tests for the hypothesis of independence when the number of variables is larger ... more We present novel tests for the hypothesis of independence when the number of variables is larger than the number of vector observations. We show that two multivariate normal vectors are independent if and only if their interpoint distance are independent. The proposed test statistics exploit different properties of the sample interpoint distances. A simulation study compares the new tests with three existing tests under various scenarios, including monotone and nonmonotone dependence structures. Numerical results show that the new methods are effective for independence testing.
Statistical Methods and Applications, Jan 30, 2020
We establish (a) the probability mass function of the interpoint distance (IPD) between random ve... more We establish (a) the probability mass function of the interpoint distance (IPD) between random vectors that are drawn from the multivariate power series family of distributions (MPSD); (b) obtain the distribution of the IPD within one sample and across two samples from this family; (c) determine the distribution of the MPSD Euclidean norm and distance from fixed points in Z d ; and (d) provide the distribution of the IPDs of vectors drawn from a mixture of the MPSD distributions. We present a method for testing the homogeneity of MPSD mixtures using the sample IPDs.
Formed in 1935, the Department of Statistics at The George Washington University is the oldest st... more Formed in 1935, the Department of Statistics at The George Washington University is the oldest statistics department within a school of liberal arts and sciences, and one of the oldest departments of statistics in the United States. The evolution of the Department of Statistics at The George Washington University (the department, henceforth) along with accomplishments of its faculty and students is described here. To commemorate its 75th anniversary, the department held a 1-day symposium featuring many distinguished speakers. The mission of the department is to provide quality teaching at the undergraduate and graduate levels with a faculty and PhD student body contributing to the frontiers of research in statistical theory, methodology, and practice. In the last quarter century or so the department has been offering service courses such as introduction to statistics to business and social sciences students. Currently, the department is servicing about 2,100 students annually. It offers BS, MS, and PhD degrees in Statistics. Jointly with the School of Public Health, it offers an MS and a PhD in Biostatistics and participates in the MS and PhD programs in Epidemiology.
Journal of Statistical Computation and Simulation, Oct 29, 2013
A ranked sampling procedure with random subsamples is proposed to estimate the population mean. F... more A ranked sampling procedure with random subsamples is proposed to estimate the population mean. Four methods of obtaining random subsamples are described. Several estimators of the mean of the population based on random subsamples in ranked set sampling are proposed. These estimators are compared with the mean of a simple random sample for estimating the mean of symmetric and skew distributions. Extensive simulation under several subsampling distributions, sample sizes, and symmetric and skew distributions shows that the estimators of the mean based on random subsamples are more accurate than existing methods.
We present a bootstrap Monte Carlo algorithm for computing the power function of the generalized ... more We present a bootstrap Monte Carlo algorithm for computing the power function of the generalized correlation coefficient. The proposed method makes no assumptions about the form of the underlying probability distribution and may be used with observed data to approximate the power function and pilot data for sample size determination. In particular, the bootstrap power functions of the Pearson product moment correlation and the Spearman rank correlation are examined. Monte Carlo experiments indicate that the proposed algorithm is reliable and compares well with the asymptotic values. An example which demonstrates how this method can be used for sample size determination and power calculations is provided.
We present a modification of a recently proposed test of symmetry about a known center. The new t... more We present a modification of a recently proposed test of symmetry about a known center. The new test uses Wilcoxon scores to weigh the runs and has a limiting normal distribution under the null hypothesis. A Monte Carlo study shows that it is more powerful than both the runs and Wilcoxon signed-rank test for the symmetry problem against alternative in the lambda family.
Abstract We consider testing the equality of the means of two groups of observations under unbala... more Abstract We consider testing the equality of the means of two groups of observations under unbalanced random sampling (URSS) when the groups have different rank sizes. We also explore tests and confidence intervals for the mean of one group under URSS. Both the parametric and bootstrap tests of means are studied. We show that the parametric test is overly conservative and nonrobust. We do not recommend it to test the mean with small sample sizes. Theoretical results are augmented with some numerical studies to show how an accurate p -value can be obtained using the bootstrap method.
Journal of the Iranian Statistical Society, Jun 1, 2021
We study the blocks of interpoint distances, their distributions, correlations, independence and ... more We study the blocks of interpoint distances, their distributions, correlations, independence and the homogeneity of their total variances. We discuss the exact and asymptotic distribution of the interpoint distances and their average under three models and provide connections between the correlation of interpoint distances with their vector correlation and test of sphericity. We discuss testing independence of the blocks based on the correlation of block interpoint distances. A homogeneity test of the total variances in each block and a simultaneous plot to visualize their relative ordering are presented.
The objective of this article is to present a depth based multivariate control quantile test usin... more The objective of this article is to present a depth based multivariate control quantile test using statistically equivalent blocks (DSEBS). Given a random sample {x 1 ,. .. , x m } of R d-valued random vectors (d ≥ 1) with a distribution function (DF) F , statistically equivalent blocks (SEBS), a multivariate generalization of the univariate sample spacings, can be constructed using a sequence of cutting functions h i (x) to order x i , i = 1,. .. , m. DSEBS are data driven, center-outward layers of shells whose shapes reflect the underlying geometric features of the unknown distribution and provide a framework for selection and comparison of cutting functions. We propose a control quantile test, using DSEBS, to test the equality of two DF s in R d. The proposed test is distribution free under the null hypothesis and well defined when d ≥ max(m, n). A simulation study compares the proposed statistic to depth-based Wilcoxon rank sum test. We show that the new test is powerful in detecting the differences in location, scale and shape (skewness or kurtosis) changes in two multivariate distributions.
SummaryWe consider groups of observations in and present a simultaneous plot of the empirical cum... more SummaryWe consider groups of observations in and present a simultaneous plot of the empirical cumulative distribution functions of the within and between interpoint distances to visualise and examine the equality of the underlying distribution functions of the observations. We provide several examples to illustrate how such plots can be utilised to envision and canvass the relationship between the two distributions under location, scale, dependence and shape changes. We suggest new statistics for testing the equality of distributions and extend the simultaneous plots to visualise them. We compare the new statistics to existing tests based on the interpoint distances.
Applied Stochastic Models in Business and Industry, Jan 27, 2020
This article surveys recent development on Euclidean interpoint distances (IPDs). IPDs find appli... more This article surveys recent development on Euclidean interpoint distances (IPDs). IPDs find applications in many scientific fields and are the building blocks of several multivariate techniques such as comparison of distributions, clustering, classification, and multidimensional scaling. In this article, we explore IPDs, discuss their properties and applications, and present their distributions for several families, including the multivariate normal, multivariate Bernoulli, multivariate power series, and the unified hypergeometric distributions. We consider two groups of observations in R d and present a simultaneous plot of the empirical cumulative distribution functions of the within and between IPDs to visualize and examine the equality of the underlying distribution functions of the observations.
Classification is a multivariate technique that is concerned with allocating new observations to ... more Classification is a multivariate technique that is concerned with allocating new observations to two or more groups. We use interpoint distances to measure the closeness of the samples and construct new rules for high dimensional classification of discrete observations. Applicable to high dimensional data, the new method is non-parametric and uses test-based classification with permutation testing. We propose a modification of a test-based rule to use relative values with respect to the training samples baseline. We compare the proposed rule with parametric methods, such as likelihood ratio rule and modified linear discriminate function, and non-parametric techniques such as support vector machine, nearest neighbour and depth-based classification, under multivariate Bernoulli, multinomial and multivariate Poisson distributions.
We present tests of multivariate uniformity using data depth, the normal quantiles and the interp... more We present tests of multivariate uniformity using data depth, the normal quantiles and the interpoint distances between the observations. We investigate the properties of the interpoint distances among uniform random vectors. We compare the performance of the proposed tests with two existing statistics under the hypothesis of uniformity and obtain their empirical power under various alternatives in a Monte Carlo study.
We consider the squared Euclidean interpoint distances (IDs) among multivariate Bernoulli observa... more We consider the squared Euclidean interpoint distances (IDs) among multivariate Bernoulli observations and determine the mean, covariance and the distribution of the IDs within a single group or across two groups. We discuss testing the equality of two distribution functions when the number of variables is large and exceeds the number of observations.
In this paper, we study estimation of θ 3 using a ranked set sample (RSS). The maximum likelihood... more In this paper, we study estimation of θ 3 using a ranked set sample (RSS). The maximum likelihood estimate (MLE) of θ 3 using a simple random sample (SRS) of size n, X SRS ={(X 1j ,X 2j ),j=1, ,n}, has been investigated (eg, Anderson, 1984). However, in many applications ...
Journal of the American Statistical Association, 1995
Building a minimal spanning tree. Library of Congress Cataloging-in-Publication Data Good, Philli... more Building a minimal spanning tree. Library of Congress Cataloging-in-Publication Data Good, Phillip I. Permutation tests: a practical guide to resampling inethods for testing hypothesesjPhillip Good. p. cm.-(Springer series in statistics) Inc1udes bibliographical references and index. 1. Statistical hypothesis testing. 2. Resampling (Statistics) I. Title. 11. Series.
Brazilian Journal of Probability and Statistics, 2020
We consider multiple imputation (MI) for unbalanced ranked set samples (URSS) by considering them... more We consider multiple imputation (MI) for unbalanced ranked set samples (URSS) by considering them as data sets with missing values. We replace each missing value with a set of plausible values drawn from a predictive distribution that represents the uncertainty about the appropriate value to impute. Using the structure of the MI dataset, we develop algorithms that imitate the structure of URSS to carry out the desired statistical inference. We provide results for the convergence of the empirical distribution functions of imputed samples to the population distribution function, under both URSS and simple random sampling (SRS). We obtain the variances of the imputed URSS, and the expected values of the variance estimators. We also study the problem of quantile estimation using an imputed URSS and propose a hybrid method based on the bootstrap and imputation of URSS data. We apply our results to estimate the mean and quantiles of the mercury in contaminated fish under perfect and imper...
Journal of Statistical Computation and Simulation, 2017
Nonparametric estimation and inferences of conditional distribution functions with longitudinal d... more Nonparametric estimation and inferences of conditional distribution functions with longitudinal data have important applications in biomedical studies, such as epidemiological studies and longitudinal clinical trials. Estimation approaches without any structural assumptions may lead to inadequate and numerically unstable estimators in practice. We propose in this paper a nonparametric approach based on time-varying parametric models for estimating the conditional distribution functions with a longitudinal sample. Our model assumes that the conditional distribution of the outcome variable at each given time point can be approximated by a parametric model after local Box-Cox transformation. Our estimation is based on a two-step smoothing method, in which we first obtain the raw estimators of the conditional distribution functions at a set of disjoint time points, and then compute the final estimators at any time by smoothing the raw estimators. Applications of our two-step estimation method have been demonstrated through a large epidemiological study of childhood growth and blood pressure. Finite sample properties of our procedures are investigated through a simulation study. Application and simulation results show that smoothing estimation from time-variant parametric models outperforms the existing kernel smoothing estimator by producing narrower pointwise bootstrap confidence band and smaller root mean squared error.
Journal of The Korean Statistical Society, Jun 1, 2016
We study nonparametric estimation of the distribution function (DF) of a continuous random variab... more We study nonparametric estimation of the distribution function (DF) of a continuous random variable based on a ranked set sampling design using the exponentially tilted (ET) empirical likelihood method. We propose ET estimators of the DF and use them to construct new resampling algorithms for unbalanced ranked set samples. We explore the properties of the proposed algorithms. For a hypothesis testing problem about the underlying population mean, we show that the bootstrap tests based on the ET estimators of the DF are asymptotically normal and exhibit a small bias of order O(n −1). We illustrate the methods and evaluate the finite sample performance of the algorithms under both perfect and imperfect ranking schemes using a real data set and several Monte Carlo simulation studies. We compare the performance of the test statistics based on the ET estimators with those based on the empirical likelihood estimators.
We present novel tests for the hypothesis of independence when the number of variables is larger ... more We present novel tests for the hypothesis of independence when the number of variables is larger than the number of vector observations. We show that two multivariate normal vectors are independent if and only if their interpoint distance are independent. The proposed test statistics exploit different properties of the sample interpoint distances. A simulation study compares the new tests with three existing tests under various scenarios, including monotone and nonmonotone dependence structures. Numerical results show that the new methods are effective for independence testing.
Statistical Methods and Applications, Jan 30, 2020
We establish (a) the probability mass function of the interpoint distance (IPD) between random ve... more We establish (a) the probability mass function of the interpoint distance (IPD) between random vectors that are drawn from the multivariate power series family of distributions (MPSD); (b) obtain the distribution of the IPD within one sample and across two samples from this family; (c) determine the distribution of the MPSD Euclidean norm and distance from fixed points in Z d ; and (d) provide the distribution of the IPDs of vectors drawn from a mixture of the MPSD distributions. We present a method for testing the homogeneity of MPSD mixtures using the sample IPDs.
Formed in 1935, the Department of Statistics at The George Washington University is the oldest st... more Formed in 1935, the Department of Statistics at The George Washington University is the oldest statistics department within a school of liberal arts and sciences, and one of the oldest departments of statistics in the United States. The evolution of the Department of Statistics at The George Washington University (the department, henceforth) along with accomplishments of its faculty and students is described here. To commemorate its 75th anniversary, the department held a 1-day symposium featuring many distinguished speakers. The mission of the department is to provide quality teaching at the undergraduate and graduate levels with a faculty and PhD student body contributing to the frontiers of research in statistical theory, methodology, and practice. In the last quarter century or so the department has been offering service courses such as introduction to statistics to business and social sciences students. Currently, the department is servicing about 2,100 students annually. It offers BS, MS, and PhD degrees in Statistics. Jointly with the School of Public Health, it offers an MS and a PhD in Biostatistics and participates in the MS and PhD programs in Epidemiology.
Journal of Statistical Computation and Simulation, Oct 29, 2013
A ranked sampling procedure with random subsamples is proposed to estimate the population mean. F... more A ranked sampling procedure with random subsamples is proposed to estimate the population mean. Four methods of obtaining random subsamples are described. Several estimators of the mean of the population based on random subsamples in ranked set sampling are proposed. These estimators are compared with the mean of a simple random sample for estimating the mean of symmetric and skew distributions. Extensive simulation under several subsampling distributions, sample sizes, and symmetric and skew distributions shows that the estimators of the mean based on random subsamples are more accurate than existing methods.
We present a bootstrap Monte Carlo algorithm for computing the power function of the generalized ... more We present a bootstrap Monte Carlo algorithm for computing the power function of the generalized correlation coefficient. The proposed method makes no assumptions about the form of the underlying probability distribution and may be used with observed data to approximate the power function and pilot data for sample size determination. In particular, the bootstrap power functions of the Pearson product moment correlation and the Spearman rank correlation are examined. Monte Carlo experiments indicate that the proposed algorithm is reliable and compares well with the asymptotic values. An example which demonstrates how this method can be used for sample size determination and power calculations is provided.
We present a modification of a recently proposed test of symmetry about a known center. The new t... more We present a modification of a recently proposed test of symmetry about a known center. The new test uses Wilcoxon scores to weigh the runs and has a limiting normal distribution under the null hypothesis. A Monte Carlo study shows that it is more powerful than both the runs and Wilcoxon signed-rank test for the symmetry problem against alternative in the lambda family.
Abstract We consider testing the equality of the means of two groups of observations under unbala... more Abstract We consider testing the equality of the means of two groups of observations under unbalanced random sampling (URSS) when the groups have different rank sizes. We also explore tests and confidence intervals for the mean of one group under URSS. Both the parametric and bootstrap tests of means are studied. We show that the parametric test is overly conservative and nonrobust. We do not recommend it to test the mean with small sample sizes. Theoretical results are augmented with some numerical studies to show how an accurate p -value can be obtained using the bootstrap method.
Journal of the Iranian Statistical Society, Jun 1, 2021
We study the blocks of interpoint distances, their distributions, correlations, independence and ... more We study the blocks of interpoint distances, their distributions, correlations, independence and the homogeneity of their total variances. We discuss the exact and asymptotic distribution of the interpoint distances and their average under three models and provide connections between the correlation of interpoint distances with their vector correlation and test of sphericity. We discuss testing independence of the blocks based on the correlation of block interpoint distances. A homogeneity test of the total variances in each block and a simultaneous plot to visualize their relative ordering are presented.
The objective of this article is to present a depth based multivariate control quantile test usin... more The objective of this article is to present a depth based multivariate control quantile test using statistically equivalent blocks (DSEBS). Given a random sample {x 1 ,. .. , x m } of R d-valued random vectors (d ≥ 1) with a distribution function (DF) F , statistically equivalent blocks (SEBS), a multivariate generalization of the univariate sample spacings, can be constructed using a sequence of cutting functions h i (x) to order x i , i = 1,. .. , m. DSEBS are data driven, center-outward layers of shells whose shapes reflect the underlying geometric features of the unknown distribution and provide a framework for selection and comparison of cutting functions. We propose a control quantile test, using DSEBS, to test the equality of two DF s in R d. The proposed test is distribution free under the null hypothesis and well defined when d ≥ max(m, n). A simulation study compares the proposed statistic to depth-based Wilcoxon rank sum test. We show that the new test is powerful in detecting the differences in location, scale and shape (skewness or kurtosis) changes in two multivariate distributions.
SummaryWe consider groups of observations in and present a simultaneous plot of the empirical cum... more SummaryWe consider groups of observations in and present a simultaneous plot of the empirical cumulative distribution functions of the within and between interpoint distances to visualise and examine the equality of the underlying distribution functions of the observations. We provide several examples to illustrate how such plots can be utilised to envision and canvass the relationship between the two distributions under location, scale, dependence and shape changes. We suggest new statistics for testing the equality of distributions and extend the simultaneous plots to visualise them. We compare the new statistics to existing tests based on the interpoint distances.
Applied Stochastic Models in Business and Industry, Jan 27, 2020
This article surveys recent development on Euclidean interpoint distances (IPDs). IPDs find appli... more This article surveys recent development on Euclidean interpoint distances (IPDs). IPDs find applications in many scientific fields and are the building blocks of several multivariate techniques such as comparison of distributions, clustering, classification, and multidimensional scaling. In this article, we explore IPDs, discuss their properties and applications, and present their distributions for several families, including the multivariate normal, multivariate Bernoulli, multivariate power series, and the unified hypergeometric distributions. We consider two groups of observations in R d and present a simultaneous plot of the empirical cumulative distribution functions of the within and between IPDs to visualize and examine the equality of the underlying distribution functions of the observations.
Classification is a multivariate technique that is concerned with allocating new observations to ... more Classification is a multivariate technique that is concerned with allocating new observations to two or more groups. We use interpoint distances to measure the closeness of the samples and construct new rules for high dimensional classification of discrete observations. Applicable to high dimensional data, the new method is non-parametric and uses test-based classification with permutation testing. We propose a modification of a test-based rule to use relative values with respect to the training samples baseline. We compare the proposed rule with parametric methods, such as likelihood ratio rule and modified linear discriminate function, and non-parametric techniques such as support vector machine, nearest neighbour and depth-based classification, under multivariate Bernoulli, multinomial and multivariate Poisson distributions.
We present tests of multivariate uniformity using data depth, the normal quantiles and the interp... more We present tests of multivariate uniformity using data depth, the normal quantiles and the interpoint distances between the observations. We investigate the properties of the interpoint distances among uniform random vectors. We compare the performance of the proposed tests with two existing statistics under the hypothesis of uniformity and obtain their empirical power under various alternatives in a Monte Carlo study.
We consider the squared Euclidean interpoint distances (IDs) among multivariate Bernoulli observa... more We consider the squared Euclidean interpoint distances (IDs) among multivariate Bernoulli observations and determine the mean, covariance and the distribution of the IDs within a single group or across two groups. We discuss testing the equality of two distribution functions when the number of variables is large and exceeds the number of observations.
In this paper, we study estimation of θ 3 using a ranked set sample (RSS). The maximum likelihood... more In this paper, we study estimation of θ 3 using a ranked set sample (RSS). The maximum likelihood estimate (MLE) of θ 3 using a simple random sample (SRS) of size n, X SRS ={(X 1j ,X 2j ),j=1, ,n}, has been investigated (eg, Anderson, 1984). However, in many applications ...
Uploads
Papers by Reza Modarres