Current Manuscripts by Anand N Vidyashankar
Risks, 2022
Portfolio credit risk is often concerned with the tail distribution of the total loss, defined to... more Portfolio credit risk is often concerned with the tail distribution of the total loss, defined to be the sum of default losses incurred from a collection of individual loans made out to the obligors. The
default for an individual loan occurs when the assets of a company (or individual) fall below a certain threshold. These assets are typically modeled according to a factor model, thereby introducing a strong dependence both among the individual loans, and potentially also among the multivariate vector of common factors. In this paper, we derive sharp tail asymptotics under two regimes: (i) a \emph{large loss regime}, where the total number of defaults increases asymptotically to infinity; and (ii) a \emph{small default regime}, where the loss threshold for an individual loan is allowed to tend asymptotically to negative infinity. Extending beyond the well-studied Gaussian distributional assumptions, we establish that---while the thresholds in the large loss regime are characterized by idiosyncratic factors specific to the individual loans---the rate of decay is governed by the common factors. Conversely, in the small default regime, we establish that the tail of the loss distribution is governed by systemic factors. We also discuss estimates for Value-at-Risk, and observe that our results may be extended to cases where the number of factors diverges to infinity.
Hellinger distance has been widely used to derive objective functions that are alternatives to ma... more Hellinger distance has been widely used to derive objective functions that are alternatives to maximum likelihood methods. While the asymptotic distributions of these estimators have been well investigated, the probabilities of rare events induced by them are largely unknown. In this article, we analyze these rare event probabilities using large deviation theory under a
potential model misspecification, in both one and higher dimensions. We show that these probabilities decay exponentially, characterizing their decay via a ``rate function'' which is expressed as a convex conjugate of a limiting cumulant generating function. In the analysis of the lower bound, in particular, certain geometric considerations arise which facilitate an explicit representation, also in the case when the limiting generating function is nondifferentiable. Our analysis involves the modulus of continuity properties of the affinity, which may be of independent interest.
arXiv, 2020
Local depth functions (LDFs) are used for describing the local geometric features of multivariate... more Local depth functions (LDFs) are used for describing the local geometric features of multivariate distributions, especially in multimodal models. In this paper, we undertake a rigorous systematic study of the LDFs and use it to develop a theoretically validated algorithm for clustering. For this reason, we establish several analytical and statistical properties of LDFs. First, we show that, when the underlying probability distribution is absolutely continuous, under an appropriate scaling that converge to zero (referred to as extreme localization), LDFs converge uniformly to a power of the density and obtain a related rate of convergence result. Second, we establish that the centered and scaled sample LDFs converge in distribution to a centered Gaussian process, uniformly in the space of bounded functions on R p × [0, ∞], as the sample size diverges to infinity. Third, under an extreme lo-calization that depends on the sample size, we determine the correct centering and scaling for the sample LDFs to possess a limiting normal distribution. Fourth, invoking the above results, we develop a new clustering algorithm that uses the LDFs and their differentiability properties. Fifth, for the last purpose, we establish several results concerning the gradient systems related to LDFs. Finally, we illustrate the finite sample performance of our results using simulations and apply them to two datasets. 1
Branching processes in random environments arise in a variety of applications such as biology, fi... more Branching processes in random environments arise in a variety of applications such as biology, finance, and other contemporary scientific areas. Motivated by these applications, this paper investigates the problem of ancestral inference. Specifically, the paper develops point and interval estimates for the mean number of ancestors initiating a branching process in i.i.d. random environments and establishes their asymptotic properties when the number of replications diverges to infinity. These results are then used to quantitate the number of DNA molecules in a genetic material using data from polymerase chain reaction experiments. Numerical experiments and data analyses are included to support the proposed methods. An R software package for implementing the methods of this manuscript is also included.
This paper is concerned with Bayesian inferential methods for data from controlled branching
pro... more This paper is concerned with Bayesian inferential methods for data from controlled branching
processes that account for model robustness through the use of disparities. Under regularity
conditions, we establish that estimators built on disparity-based posterior, such as expectation
and maximum a posteriori estimates, are consistent and efficient under the posited model.
Additionally, we show that the estimates are robust to model misspecification and presence of
aberrant outliers. To this end, we develop several fundamental ideas relating minimum disparity
estimators to Bayesian estimators built on the disparity-based posterior, for dependent treestructured
data. We illustrate the methodology through a simulated example and apply our
methods to a real data set from cell kinetics.
Entropy
Big data and streaming data are encountered in a variety of contemporary applications in business... more Big data and streaming data are encountered in a variety of contemporary applications in business and industry. In such cases, it is common to use random projections to reduce the dimension of the data yielding compressed data. These data however possess various anomalies such as heterogeneity, outliers, and round-off errors which are hard to detect due to volume and processing challenges. This paper describes a new robust and efficient methodology, using Hellinger distance, to analyze the compressed data. Using large sample methods and numerical experiments, it is demonstrated that a routine use of robust estimation procedure is feasible. The role of double limits in understanding the efficiency and robustness is brought out which is of independent interest.
Post-Selection Inference by Anand N Vidyashankar
Sparse finite mixture of regression models arise in several scientific applications and testing h... more Sparse finite mixture of regression models arise in several scientific applications and testing hypotheses concerning regression coefficients in such models is fundamental to data analysis. In this paper, we describe an approach for hypothesis testing of regression coefficients that take into account model selection uncertainty. The proposed methods involve (i) estimating the active predictor set of the sparse model using a consistent model selector and (ii) testing hypotheses concerning the regression coefficients associated with the estimated active predictor set. The methods asymptotically control the family wise error rate at a pre-specified nominal level, while accounting for variable selection uncertainty. Additionally, we provide examples of consistent model selectors and describe methods for finite sample improvements. Performance of the methods are also illustrated using simulations. A real data analysis is included to illustrate the applicability of the methods.
Privacy Analytics by Anand N Vidyashankar
Robust Analytics: Divergence based Inference by Anand N Vidyashankar
Journal of Statistical Planning and Inference, 2006
Response-adaptive designs in clinical trials incorporate information from prior patient responses... more Response-adaptive designs in clinical trials incorporate information from prior patient responses in order to assign better performing treatments to the future patients of a clinical study. An example of a response adaptive design that has received much attention in recent years is the randomized play the winner design (RPWD). Beran [1977. Minimum Hellinger distance estimates for parametric models. Ann. Statist. 5, 445–463] investigated the problem of minimum Hellinger distance procedure (MHDP) for continuous data and showed that minimum Hellinger distance estimator (MHDE) of a finite dimensional parameter is as efficient as the MLE (maximum likelihood estimator) under a true model assumption. This paper develops minimum Hellinger distance methodology for data generated using RPWD. A new algorithm using the Monte Carlo approximation to the estimating equation is proposed. Consistency and asymptotic normality of the estimators are established and the robustness and small sample performance of the estimators are illustrated using simulations. The methodology when applied to the clinical trial data conducted by Eli-Lilly and Company, brings out the treatment effect in one of the strata using the frequentist techniques compared to the Bayesian argument of Tamura et al [1994. A case study of an adaptive clinical trialin the treatment of out-patients with depressive disorder. J. Amer. Statist. Assoc. 89, 768–776].
Journal of Statistical Planning and Inference, 1997
Works of Lindsay (1994) and Basu and Sarkar (1994a) provide heuristic arguments and some empirica... more Works of Lindsay (1994) and Basu and Sarkar (1994a) provide heuristic arguments and some empirical evidence that the minimum negative exponential disparity estimator (MNEDE), like the minimum Hellinger distance estimator (MHDE) (Beran, 1977), is a robust alternative to the usual maximum likelihood estimator when data contain outliers. In this paper we establish the robustness properties of the MNEDE and prove that it is asymptotically fully efficient under a specified regular parametric family of densities. Also our simulation results show that unlike the MHDE the MNEDE is robust not only against outliers, but also against inliers, defined as values with less data than expected.
Statistics & Probability Letters, 2000
This paper studies the asymptotic behavior of the minimum Hellinger distance estimator of the und... more This paper studies the asymptotic behavior of the minimum Hellinger distance estimator of the underlying parameter in a supercritical branching process whose offspring distribution is known to belong to a parametric family. This estimator is shown to be asymptotically normal, efficient at the true model and robust against gross errors. These extend the results of Beran (Ann. Statist. 5, 445–463 (1977)) from an i.i.d., continuous setup to a dependent, discrete setup.
This paper develops a methodology for robust Bayesian inference through the use of disparities. M... more This paper develops a methodology for robust Bayesian inference through the use of disparities. Metrics such as Hellinger distance and negative exponential disparity have a long history in robust estimation in frequentist inference. We demonstrate that an equivalent robustification may be made in Bayesian inference by substituting an appropriately scaled disparity for the log likelihood to which standard Monte Carlo Markov Chain methods may be applied. A particularly appealing property of minimum-disparity methods is that while they yield robustness with a breakdown point of 1/2, the resulting parameter estimates are also efficient when the posited probabilis-tic model is correct. We demonstrate that a similar property holds for disparity-based Bayesian inference. We further show that in the Bayesian setting, it is also possible to extend these methods to robustify regression models, random effects distributions and other hierarchical models. These models require integrating out a random effect; this is achieved via MCMC but would otherwise be numerically challenging. The methods are demonstrated on real-world data.
Inference for Branching Processes by Anand N Vidyashankar
This paper is concerned with Bayesian inferential methods for data from controlled branching proc... more This paper is concerned with Bayesian inferential methods for data from controlled branching processes that account for model robustness through the use of disparities. Under regularity conditions, we establish that estimators built on disparity-based posterior, such as expectation and maximum a posteriori estimates, are consistent and efficient under the posited model. Additionally, we show that the estimates are robust to model misspecification and presence of aberrant outliers. To this end, we develop several fundamental ideas relating minimum disparity estimators to Bayesian estimators built on the disparity-based posterior, for dependent tree-structured data. We illustrate the methodology through a simulated example and apply our methods to a real data set from cell kinetics.
The quantitative polymerase chain reaction (qPCR) is a widely used tool for gene quantitation and... more The quantitative polymerase chain reaction (qPCR) is a widely used tool for gene quantitation and has been applied extensively in several scientific areas. The current methods used for analyzing qPCR data fail to account for multiple sources of variability present in the PCR dynamics, leading to biased estimates and incorrect inference. In this paper, we introduce a branching process model with random effects to account for within reaction and between reaction variability in PCR experiments. We describe, in terms of the observed fluorescence data, new statistical methodology for gene quantitation. Using simulations, PCR experiments, and asymptotic theory we demonstrate the improvements achieved by our methodology compared to existing methods. This article has supplemental materials online.
Sequential Analysis, 2001
ABSTRACT
Sequential Analysis, 2000
For a subcritical branching process with observable immigration, we consider the problem of findi... more For a subcritical branching process with observable immigration, we consider the problem of finding a sample size for which the mean squared error of a natural estimator of the offspring mean is less than a prescribed value u. If the parameters of the process are known, then a minimum sample size achieving the above objective (approximately) exists. When the parameters
Uploads
Current Manuscripts by Anand N Vidyashankar
default for an individual loan occurs when the assets of a company (or individual) fall below a certain threshold. These assets are typically modeled according to a factor model, thereby introducing a strong dependence both among the individual loans, and potentially also among the multivariate vector of common factors. In this paper, we derive sharp tail asymptotics under two regimes: (i) a \emph{large loss regime}, where the total number of defaults increases asymptotically to infinity; and (ii) a \emph{small default regime}, where the loss threshold for an individual loan is allowed to tend asymptotically to negative infinity. Extending beyond the well-studied Gaussian distributional assumptions, we establish that---while the thresholds in the large loss regime are characterized by idiosyncratic factors specific to the individual loans---the rate of decay is governed by the common factors. Conversely, in the small default regime, we establish that the tail of the loss distribution is governed by systemic factors. We also discuss estimates for Value-at-Risk, and observe that our results may be extended to cases where the number of factors diverges to infinity.
potential model misspecification, in both one and higher dimensions. We show that these probabilities decay exponentially, characterizing their decay via a ``rate function'' which is expressed as a convex conjugate of a limiting cumulant generating function. In the analysis of the lower bound, in particular, certain geometric considerations arise which facilitate an explicit representation, also in the case when the limiting generating function is nondifferentiable. Our analysis involves the modulus of continuity properties of the affinity, which may be of independent interest.
processes that account for model robustness through the use of disparities. Under regularity
conditions, we establish that estimators built on disparity-based posterior, such as expectation
and maximum a posteriori estimates, are consistent and efficient under the posited model.
Additionally, we show that the estimates are robust to model misspecification and presence of
aberrant outliers. To this end, we develop several fundamental ideas relating minimum disparity
estimators to Bayesian estimators built on the disparity-based posterior, for dependent treestructured
data. We illustrate the methodology through a simulated example and apply our
methods to a real data set from cell kinetics.
Post-Selection Inference by Anand N Vidyashankar
Privacy Analytics by Anand N Vidyashankar
Robust Analytics: Divergence based Inference by Anand N Vidyashankar
Inference for Branching Processes by Anand N Vidyashankar
default for an individual loan occurs when the assets of a company (or individual) fall below a certain threshold. These assets are typically modeled according to a factor model, thereby introducing a strong dependence both among the individual loans, and potentially also among the multivariate vector of common factors. In this paper, we derive sharp tail asymptotics under two regimes: (i) a \emph{large loss regime}, where the total number of defaults increases asymptotically to infinity; and (ii) a \emph{small default regime}, where the loss threshold for an individual loan is allowed to tend asymptotically to negative infinity. Extending beyond the well-studied Gaussian distributional assumptions, we establish that---while the thresholds in the large loss regime are characterized by idiosyncratic factors specific to the individual loans---the rate of decay is governed by the common factors. Conversely, in the small default regime, we establish that the tail of the loss distribution is governed by systemic factors. We also discuss estimates for Value-at-Risk, and observe that our results may be extended to cases where the number of factors diverges to infinity.
potential model misspecification, in both one and higher dimensions. We show that these probabilities decay exponentially, characterizing their decay via a ``rate function'' which is expressed as a convex conjugate of a limiting cumulant generating function. In the analysis of the lower bound, in particular, certain geometric considerations arise which facilitate an explicit representation, also in the case when the limiting generating function is nondifferentiable. Our analysis involves the modulus of continuity properties of the affinity, which may be of independent interest.
processes that account for model robustness through the use of disparities. Under regularity
conditions, we establish that estimators built on disparity-based posterior, such as expectation
and maximum a posteriori estimates, are consistent and efficient under the posited model.
Additionally, we show that the estimates are robust to model misspecification and presence of
aberrant outliers. To this end, we develop several fundamental ideas relating minimum disparity
estimators to Bayesian estimators built on the disparity-based posterior, for dependent treestructured
data. We illustrate the methodology through a simulated example and apply our
methods to a real data set from cell kinetics.