Papers by Georg Schollmeyer
The reliable analysis of interval data (coarsened data) is one of the most promising applications... more The reliable analysis of interval data (coarsened data) is one of the most promising applications of imprecise probabilities in statistics. If one refrains from making untestable, and often materially unjustified, strong assumptions on the coarsening process, then the empirical distribution of the data is imprecise, and statistical models are, in Manski’s terms, partially identified. We first elaborate some subtle differences between two natural ways of handling interval data in the dependent variable of regression models, distinguishing between two different types of identification regions, called Sharp Marrow Region (SMR) and Sharp Collection Region (SCR) here. Focusing on the case of linear regression analysis, we then derive some fundamental geometrical properties of SMR and SCR, allowing a comparison of the regions and providing some guidelines for their canonical construction. Relying on the algebraic framework of adjunctions of two mappings between partially ordered sets, we ...
Statistical Science, 2021
We congratulate Ruobin Gong and Xiao-Li Meng on their thought-provoking paper demonstrating the p... more We congratulate Ruobin Gong and Xiao-Li Meng on their thought-provoking paper demonstrating the power of imprecise probabilities in statistics. In particular, Gong and Meng clarify important statistical paradoxes by discussing them in the framework of generalized uncertainty quantification and different conditioning rules used for updating. In this note, we characterize all three conditioning rules as envelopes of certain sets of conditional probabilities. This view also suggests some generalizations that can be seen as compromise rules. Similar to Gong and Meng, our derivations mainly focus on Choquet capacities of order 2, and so we also briefly discuss in general their role as statistical models. We conclude with some general remarks on the potential of imprecise probabilities to cope with the multidimensional nature of uncertainty.
International Statistical Review, 2019
In most surveys, one is confronted with missing or, more generally, coarse data. Traditional meth... more In most surveys, one is confronted with missing or, more generally, coarse data. Traditional methods dealing with these data require strong, untestable and often doubtful assumptions, for example, coarsening at random. But due to the resulting, potentially severe bias, there is a growing interest in approaches that only include tenable knowledge about the coarsening process, leading to imprecise but reliable results. In this spirit, we study regression analysis with a coarse categorical-dependent variable and precisely observed categorical covariates. Our (profile) likelihood-based approach can incorporate weak knowledge about the coarsening process and thus offers a synthesis of traditional methods and cautious strategies refraining from any coarsening assumptions. This also allows a discussion of the uncertainty about the coarsening process, besides sampling uncertainty and model uncertainty. Our procedure is illustrated with data of the panel study 'Labour market and social security' conducted by the Institute for Employment Research, whose questionnaire design produces coarse data.
International Journal of Approximate Reasoning, 2015
The reliable analysis of interval data (coarsened data) is one of the most promising applications... more The reliable analysis of interval data (coarsened data) is one of the most promising applications of imprecise probabilities in statistics. If one refrains from making untestable, and often materially unjustified, strong assumptions on the coarsening process, then the empirical distribution of the data is imprecise, and statistical models are, in Manski's terms, partially identified. We first elaborate some subtle differences between two natural ways of handling interval data in the dependent variable of regression models, distinguishing between two different types of identification regions, called Sharp Marrow Region (SMR) and Sharp Collection Region (SCR) here. Focusing on the case of linear regression analysis, we then derive some fundamental geometrical properties of SMR and SCR, allowing a comparison of the regions and providing some guidelines for their canonical construction. Relying on the algebraic framework of adjunctions of two mappings between partially ordered sets, we characterize SMR as a right adjoint and as the monotone kernel of a criterion function based mapping, while SCR is indeed interpretable as the corresponding monotone hull. Finally we sketch some ideas on a compromise between SMR and SCR based on a set-domained loss function. This paper is an extended version of a shorter paper with the same title, that is conditionally accepted for publication in the Proceedings of the Eighth International Symposium on Imprecise Probability: Theories and Applications. In the present paper we added proofs and the seventh chapter with a small Monte-Carlo-Illustration, that would have made the original paper too long.
SSRN Electronic Journal, 2019
We investigate risk attitudes when the underlying domain of payoffs is finite and the payoffs are... more We investigate risk attitudes when the underlying domain of payoffs is finite and the payoffs are, in general, not numerical. In such cases, the traditional notions of absolute risk attitudes, that are designed for convex domains of numerical payoffs, are not applicable. We introduce comparative notions of weak and strong risk attitudes that remain applicable. We examine how they are characterized within the rankdependent utility model, thus including expected utility as a special case. In particular, we characterize strong comparative risk aversion under rank-dependent utility. This is our main result. From this and other findings, we draw two novel conclusions. First, under expected utility, weak and strong comparative risk aversion are characterized by the same condition over finite domains. By contrast, such is not the case under non-expected utility. Second, under expected utility, weak (respectively: strong) comparative risk aversion is characterized by the same condition when the utility functions have finite range and when they have convex range (alternatively, when the payoffs are numerical and their domain is finite or convex, respectively). By contrast, such is not the case under non-expected utility. Thus, considering comparative risk aversion over finite domains leads to a better understanding of the divide between expected and non-expected utility, more generally, the structural properties of the main models of decision-making under risk.
The paper deals with parameter estimation for categor-ical data under epistemic data imprecision,... more The paper deals with parameter estimation for categor-ical data under epistemic data imprecision, where for a part of the data only coarse(ned) versions of the true values are observable. For different observation models formalizing the information available on the coarsening process, we derive the (typically set-valued) maximum likelihood estimators of the underlying distributions. We discuss the homogeneous case of independent and identically distributed variables as well as logistic re-gression under a categorical covariate. We start with the imprecise point estimator under an observation model describing the coarsening process without any further assumptions. Then we determine several sen-sitivity parameters that allow the refinement of the estimators in the presence of auxiliary information.
In the context of the analysis of interval-valued or set-valued data it is often emphasized that ... more In the context of the analysis of interval-valued or set-valued data it is often emphasized that one has to carefully distinguish between an epistemic and an ontic understanding of set-valued data. However, there are cases, for which an ontic and an epistemic view do still lead to exactly the same results of the corresponding data analysis. The present paper is a short note on this fact in the context of the analysis of stochastic dominance for interval-valued data.
In most surveys, one is confronted with missing or, more generally, coarse data. Many methods dea... more In most surveys, one is confronted with missing or, more generally, coarse data. Many methods dealing with these data make strong, untestable assumptions, e.g. coarsening at random. But due to the potentially resulting severe bias, interest increases in approaches that only include tenable knowledge about the coarsening process, leading to imprecise, but credible results. We elaborate such cautious methods for regression analysis with a coarse categorical dependent variable and precisely observed categorical covariates. Our cautious results from the German panel study "Labour market and social security'' illustrate that traditional methods may even pretend specific signs of the regression estimates.
We congratulate Ruobin Gong and Xiao-Li Meng on their thoughtprovoking paper demonstrating the po... more We congratulate Ruobin Gong and Xiao-Li Meng on their thoughtprovoking paper demonstrating the power of imprecise probabilities in statistics. In particular, Gong and Meng clarify important statistical paradoxes by discussing them in the framework of generalized uncertainty quantification and different conditioning rules used for updating. In this note, we characterize all three conditioning rules as envelopes of certain sets of conditional probabilities. This view also suggests some generalizations that can be seen as compromise rules. Similar to Gong and Meng, our derivations mainly focus on Choquet capacities of order 2, and so we also briefly discuss in general their role as statistical models. We conclude with some general remarks on the potential of imprecise probabilities to cope with the multidimensional nature of uncertainty.
The paper is concerned with decision making under complex uncertainty. We consider the Hodges and... more The paper is concerned with decision making under complex uncertainty. We consider the Hodges and Lehmann-criterion relying on uncertain classical probabilities and Walley’s maximality relying on imprecise probabilities. We present linear programming based approaches for computing optimal acts as well as for determining least favorable prior distributions in finite decision settings. Further, we apply results from duality theory of linear programming in order to provide theoretical insights into certain characteristics of these optimal solutions. Particularly, we characterize conditions under which randomization pays out when defining optimality in terms of the Gamma-Maximin criterion and investigate how these conditions relate to least favorable priors.
The statistical analysis of "real-world" data is often confronted with the fact that mo... more The statistical analysis of "real-world" data is often confronted with the fact that most standard statistical methods were developed under some kind of idealization of the data that is often not adequate in practical situations. This concerns among others i) the potentially deficient quality of the data that can arise for example due to measurement error, non-response in surveys or data processing errors and ii) the scale quality of the data, that is idealized as "the data have some clear scale of measurement that can be uniquely located within the scale hierarchy of Stevens (or that of Narens and Luce or Orth)". Modern statistical methods like, e.g., correction techniques for measurement error or robust methods cope with issue i). In the context of missing or coarsened data, imputation techniques and methods that explicitly model the missing/coarsening process are nowadays wellestablished tools of refined data analysis. Concerning ii) the typical statistical vi...
Since coarse(ned) data naturally induce set-valued estimators , analysts often assume coarsening ... more Since coarse(ned) data naturally induce set-valued estimators , analysts often assume coarsening at random (CAR) to force them to be single-valued. Using the PASS data as an example, we re-illustrate the impossibility to test CAR and contrast it to another type of uninformative coarsening called subgroup independence (SI). It turns out that SI is testable here.
In this paper we develop a descriptive concept of a (partially) ordinal joint scaling of items an... more In this paper we develop a descriptive concept of a (partially) ordinal joint scaling of items and persons in the context of (dichotomous) item response analysis. The developed method has to be understood as a purely descriptive method describing relations among the data observed in a given item response data set, it is not intended to directly measure some presumed underlying latent traits. We establish a hierarchy of pairs of item difficulty and person ability orderings that empirically support each other. The ordering principles we use for the construction are essentially related to the concept of first order stochastic dominance. Our method is able to avoid a paradoxical result of multidimensional item response theory models described in \cite{hooker2009paradoxical}. We introduce our concepts in the language of formal concept analysis. This is due to the fact that our method has some similarities with formal concept analysis and knowledge space theory: Both our methods as well a...
Simulation studies are becoming increasingly important for the evaluation of complex statistical ... more Simulation studies are becoming increasingly important for the evaluation of complex statistical methods. They tend to represent idealized situations. With our framework, which incorporates dependency structures using copulas, we propose multidimensional simulation data with marginals based on different degrees of heterogeneity, which are built on different ranges of distribution parameters of a zero-inflated negative binomial distribution. The obtained higher and lower variation of the simulation data allows to create lower and upper distribution functions lead to simulation data containing extreme points for each observation. Our approach aims at being closer to reality by considering data distortion. It is an approach of examining classification quality in case of measurement distortions in gene expression data and might propose specific instructions of calibrating measuring instruments.
This paper is concerned with decision making using imprecise probabilities. In the first part, we... more This paper is concerned with decision making using imprecise probabilities. In the first part, we introduce a new decision criterion that allows for explicitly modeling how far decisions that are optimal in terms of Walley’s maximality are accepted to deviate from being optimal in the sense of Levi’s E-admissibility. For this criterion, we also provide an efficient and simple algorithm based on linear programming theory. In the second part of the paper, we propose two new measures for quantifying the extent of E-admissibility of an E-admissible act, i.e. the size of the set of measures for which the corresponding act maximizes expected utility. The first measure is the maximal diameter of this set, while the second one relates to the maximal barycentric cube that can be inscribed into it. Also here, for both measures, we give linear programming algorithms capable to deal with them. Finally, we discuss some ideas in the context of ordinal decision theory. The paper concludes with a s...
This paper develops a combinatorial description of the extreme points of the core of a necessity ... more This paper develops a combinatorial description of the extreme points of the core of a necessity measure on a finite space. We use the ingredients of Dempster-Shafer theory to characterize a necessity measure and the extreme points of its core in terms of the Mobius inverse, as well as an interpretation of the elements of the core as obtained through a transfer of probability mass from non-elementary events to singletons. With this understanding we derive an exact formula for the number of extreme points of the core of a necessity measure and obtain a constructive combinatorial insight into how the extreme points are obtained in terms of mass transfers. Our result sharpens the bounds for the number of extreme points given in [15] or [14, 13]. Furthermore, we determine the number of edges of the core of a necessity measure and additionally show how our results could be used to enumerate the extreme points of the core of arbitrary belief functions in a not too inefficient way.
The aim of the present paper is to apply a recently developed quantile approach for lattice-value... more The aim of the present paper is to apply a recently developed quantile approach for lattice-valued data to the special case of ranking data. We show how to analyze profiles of total orders by means of lattice-valued quantiles and thereby develop new methods of descriptive data analysis for ranking data beyond known methods like permutation polytopes or multidimensional scaling. We furthermore develop an aggregation rule for social profiles (, called commonality sharing, here,) that selects from a given profile that ordering(s) that is (are) most centered in the profile, where the notion of centrality and outlyingness are based on purely order-theoretic concepts. Finally, we sketch, how one can use the quantile approach to establish tests of model fit for statistical models of ranking data.
In this paper we propose efficient methods for elicitation of complexly structured preferences an... more In this paper we propose efficient methods for elicitation of complexly structured preferences and utilize these in problems of decision making under (severe) uncertainty. Based on the general framework introduced in Jansen, Schollmeyer & Augustin (2018, Int. J. Approx. Reason), we now design elicitation procedures and algorithms that enable decision makers to reveal their underlying preference system (i.e. two relations, one encoding the ordinal, the other the cardinal part of the preferences) while having to answer as few as possible simple ranking questions. Here, two different approaches are followed. The first approach directly utilizes the collected ranking data for obtaining the ordinal part of the preferences, while their cardinal part is constructed implicitly by measuring meta data on the decision maker’s consideration times. In contrast, the second approach explicitly elicits also the cardinal part of the decision maker’s preference system, however, only an approximate ve...
Uploads
Papers by Georg Schollmeyer