The purpose of this study is to investigate the health impact of air conditioning use during the ... more The purpose of this study is to investigate the health impact of air conditioning use during the 2018 heatwave in South Korea, which is the warmest in Korean weather observation history. Methods: Participants in this study were 1,000 adults aged >18 who were recruited from across Korea. Participants were asked about the symptoms of various diseases, air-conditioningitis, food poisoning, and whether they had any absences, lateness, or canceling appointments during the heatwave. About their air conditioning use, we questioned whether air conditioning was used in main living space excluding home, had air conditioning at home, and could not use air conditioning because of the electric charges when air conditioning is necessary. Association between air conditioning use and health impacts during the heatwave was analyzed by multiple logistic regression model. Potential confounding factors included age, sex, residential area, occupation, type of house, number of family members, prevalence of chronic diseases, and monthly household income. Results: When air conditioner was not operated in main living space excluding home, odds ratio (OR) of experiencing symptoms related to cardiovascular diseases and neurological diseases was 8.53, and 2.02, respectively. In the absence of air conditioner at home, OR of experiencing symptoms related to neurological diseases and canceling appointments was 1.64, and 3.43, respectively. When electric charges limit air conditioning use where it is necessary, OR of experiencing symptoms of all diseases, skin diseases, digestive diseases, kidney and urinary diseases, nervous system diseases, mental diseases, sleep disorders, food poisoning, lateness, and canceling appointments was 1.
For sensitivity analysis with stochastic counterfactuals, we introduce a methodology to character... more For sensitivity analysis with stochastic counterfactuals, we introduce a methodology to characterize uncertainty in causal inference from natural experiments. Our sensitivity parameters are standardized measures of variation in propensity and prognosis probabilities, and one minus their geometric mean is an intuitive measure of randomness in the data generating process. Within our latent propensity‐prognosis model, we show how to compute, from contingency table data, a threshold, , of sufficient randomness for causal inference. If the actual randomness of the data generating process is greater than this threshold, then causal inference is warranted. We demonstrate our methodology with two example applications.
This expository article states and proves four, concrete, projective, central limit theorems. The... more This expository article states and proves four, concrete, projective, central limit theorems. The results are known or suspected to be true by experts who are familiar with the more general central limit theorem for convex bodies, and related theory. Here we consider only four types of high dimensional geometric objects: spheres, balls, cubes, and boundaries of cubes. Each is capable of transforming uniform random variables into normal random variables through projection. This paper has been written to introduce new proof techniques, demonstrate how statistical simulation can be applied to geometry, and to build a foundation upon which recreational research projects can be built. The goal is to give the reader a better understanding of some of the mathematics at the juncture of probability theory, analysis, and geometry in high dimensions.
The purpose of this study is to investigate the health impact of air conditioning use during the ... more The purpose of this study is to investigate the health impact of air conditioning use during the 2018 heatwave in South Korea, which is the warmest in Korean weather observation history. Methods: Participants in this study were 1,000 adults aged >18 who were recruited from across Korea. Participants were asked about the symptoms of various diseases, air-conditioningitis, food poisoning, and whether they had any absences, lateness, or canceling appointments during the heatwave. About their air conditioning use, we questioned whether air conditioning was used in main living space excluding home, had air conditioning at home, and could not use air conditioning because of the electric charges when air conditioning is necessary. Association between air conditioning use and health impacts during the heatwave was analyzed by multiple logistic regression model. Potential confounding factors included age, sex, residential area, occupation, type of house, number of family members, prevalence of chronic diseases, and monthly household income. Results: When air conditioner was not operated in main living space excluding home, odds ratio (OR) of experiencing symptoms related to cardiovascular diseases and neurological diseases was 8.53, and 2.02, respectively. In the absence of air conditioner at home, OR of experiencing symptoms related to neurological diseases and canceling appointments was 1.64, and 3.43, respectively. When electric charges limit air conditioning use where it is necessary, OR of experiencing symptoms of all diseases, skin diseases, digestive diseases, kidney and urinary diseases, nervous system diseases, mental diseases, sleep disorders, food poisoning, lateness, and canceling appointments was 1.
When studying the causal effect of x on y, researchers may conduct regression and report a confid... more When studying the causal effect of x on y, researchers may conduct regression and report a confidence interval for the slope coefficient βx. This common confidence interval provides an assessment of uncertainty from sampling error, but it does not assess uncertainty from confounding. An intervention on x may produce a response in y that is unexpected, and our misinterpretation of the slope happens when there are confounding factors w. When w are measured we may conduct multiple regression, but when w are unmeasured it is common practice to include a precautionary statement when reporting the confidence interval, warning against unwarranted causal interpretation. If the goal is robust causal interpretation then we can do something more informative. Uncertainty in the specification of three confounding parameters can be propagated through an equation to produce a confounding interval. Here we develop supporting mathematical theory and describe an example application. Our proposed methodology applies well to studies of a continuous response or rare outcome. It is a general method for quantifying error from model uncertainty. Whereas confidence intervals are used to assess uncertainty from unmeasured individuals, confounding intervals can be used to assess uncertainty from unmeasured attributes.
When a linear model is adjusted to control for additional explanatory variables the sign of a fit... more When a linear model is adjusted to control for additional explanatory variables the sign of a fitted coefficient may reverse. Here these reversals are studied using coefficients of determination. The resulting theory can be used to determine directions of unique effects in the presence of substantial model uncertainty. This process is called model-independent estimation when the estimates are invariant across changes to the model structure. When a single covariate is added, the reversal region can be understood geometrically as an elliptical cone of two nappes with an axis of symmetry relating to a best-possible condition for a reversal using a single coefficient of determination. When a set of covariates are added to a model with a single explanatory variable, model-independent estimation can be implemented using subject matter knowledge. More general theory with partial coefficients is applicable to analysis of large data sets. Applications are demonstrated with dietary health dat...
The purpose of this dissertation was to create a textbook that supplements traditional statistics... more The purpose of this dissertation was to create a textbook that supplements traditional statistics curriculum. Emphasis was placed on statistical computing. Multiple computing environments were used to demonstrate essential skills for the applied statistican. Mathematical theory was developed in order to make the text self contained. A technique was developed to replace a faulty command for computing moment generating functions within Mathematica.
This work is copyrighted by Università del Salento, and is licensed under a Creative Commons Attr... more This work is copyrighted by Università del Salento, and is licensed under a Creative Commons Attribuzione-Non commerciale-Non opere derivate 3.0 Italia License. For more information see: http://creativecommons.org/licenses/by-nc-nd/3.0/it/ An ordinary least squares regression estimate for the slope, regardless of its strength, can have its sign reversed through adjustment for a random confounding vector of data. The assumption of a rotationally invariant distribution , on the space of centered, random, confounding vectors of data, makes calculation of probabilities for these reversals possible. Here, as the sample size increases, these probabilities are shown to decrease exponentially. This analytic result leads to some asymptotic comparison between regular sampling error and the error due to a mis-specified model.
Spurious association arises from covariance between propensity for the treatment and individual r... more Spurious association arises from covariance between propensity for the treatment and individual risk for the outcome. For sensitivity analysis with stochastic counterfactuals we introduce a methodology to characterize uncertainty in causal inference from natural experiments and quasiexperiments. Our sensitivity parameters are standardized measures of variation in propensity and individual risk, and one minus their geometric mean is an intuitive measure of randomness in the data generating process. Within our latent propensity-risk model, we show how to compute from contingency table data a threshold, T , of sufficient randomness for causal inference. If the actual randomness of the data generating process exceeds this threshold then causal inference is warranted.
This expository article states and proves four, concrete, projective, central limit theorems. The... more This expository article states and proves four, concrete, projective, central limit theorems. The results are known or suspected to be true by experts who are familiar with the more general central limit theorem for convex bodies, and related theory. Here we consider only four types of high dimensional geometric objects: spheres, balls, cubes, and boundaries of cubes. Each is capable of transforming uniform random variables into normal random variables through projection. This paper has been written to introduce new proof techniques, demonstrate how statistical simulation can be applied to geometry, and to build a foundation upon which recreational research projects can be built. The goal is to give the reader a better understanding of some of the mathematics at the juncture of probability theory, analysis, and geometry in high dimensions.
Results of multiple regression analysis are often reported as if model uncertainty is not an issu... more Results of multiple regression analysis are often reported as if model uncertainty is not an issue. If, however, omitted-variable bias is a valid concern, then the results of this paper may apply. The main result is a simple theorem, roughly asserting that weak correlates can not reverse existing estimates. The contrapositive, in a special case, produces necessary conditions for the Yule–Simpson effect. Other applications are discussed, and a few counter examples are presented to demonstrate how confounding can occur when least expected.
An interesting problem in recreational mathematics is the three cups problem. It is a problem wit... more An interesting problem in recreational mathematics is the three cups problem. It is a problem without a solution. There are infinitely many, related, n-choose-r cup problems. Each of these is shown to be decidable, through the construction of an explicit algorithm. The algorithm is able to compute solutions for solvable problems. The algorithm is run for 0 r n 10 and the results are discussed.
An ordinary least squares regression estimate for the slope, regardless of its strength, can have... more An ordinary least squares regression estimate for the slope, regardless of its strength, can have its sign reversed through adjustment for a random confounding vector of data. The assumption of a rotionally invariant distribution, on the space of centered, random, confounding vectors of data, makes calculation of probabilities for these reversals possible. Here these probabilities are shown to decrease exponentially, as the sample size increases. This analytic result leads to some asymptotic comparison between regular sampling error and the error due to a mis-specified model.
An adjusted estimate may be opposite the original estimate. This paper presents necessary and suf... more An adjusted estimate may be opposite the original estimate. This paper presents necessary and sufficient conditions for such a reversal, in the context of linear modeling, where adjustment is obtained through considering additional explanatory data associated with lurking variables.
There are innitely many n for which the sectionfx2 R n : P xi = p n=12g intersects a subset of th... more There are innitely many n for which the sectionfx2 R n : P xi = p n=12g intersects a subset of the vertices of a centered, unit, n-cube. Whenever this happens the volume for the resulting (n 1)-dimensional polytope can be computed using combinatorics. In high dimensions these volumes converge to p 6=(e ), and a related sequence of rational numbers converges to e . Here we explicitly establish this concrete combinatorial sequence of rational approximations for e with a proof based on the local central limit theorem. Similar sequences could be constructed with related techniques based on the central limit theorem for convex sets.
The purpose of this study is to investigate the health impact of air conditioning use during the ... more The purpose of this study is to investigate the health impact of air conditioning use during the 2018 heatwave in South Korea, which is the warmest in Korean weather observation history. Methods: Participants in this study were 1,000 adults aged >18 who were recruited from across Korea. Participants were asked about the symptoms of various diseases, air-conditioningitis, food poisoning, and whether they had any absences, lateness, or canceling appointments during the heatwave. About their air conditioning use, we questioned whether air conditioning was used in main living space excluding home, had air conditioning at home, and could not use air conditioning because of the electric charges when air conditioning is necessary. Association between air conditioning use and health impacts during the heatwave was analyzed by multiple logistic regression model. Potential confounding factors included age, sex, residential area, occupation, type of house, number of family members, prevalence of chronic diseases, and monthly household income. Results: When air conditioner was not operated in main living space excluding home, odds ratio (OR) of experiencing symptoms related to cardiovascular diseases and neurological diseases was 8.53, and 2.02, respectively. In the absence of air conditioner at home, OR of experiencing symptoms related to neurological diseases and canceling appointments was 1.64, and 3.43, respectively. When electric charges limit air conditioning use where it is necessary, OR of experiencing symptoms of all diseases, skin diseases, digestive diseases, kidney and urinary diseases, nervous system diseases, mental diseases, sleep disorders, food poisoning, lateness, and canceling appointments was 1.
For sensitivity analysis with stochastic counterfactuals, we introduce a methodology to character... more For sensitivity analysis with stochastic counterfactuals, we introduce a methodology to characterize uncertainty in causal inference from natural experiments. Our sensitivity parameters are standardized measures of variation in propensity and prognosis probabilities, and one minus their geometric mean is an intuitive measure of randomness in the data generating process. Within our latent propensity‐prognosis model, we show how to compute, from contingency table data, a threshold, , of sufficient randomness for causal inference. If the actual randomness of the data generating process is greater than this threshold, then causal inference is warranted. We demonstrate our methodology with two example applications.
This expository article states and proves four, concrete, projective, central limit theorems. The... more This expository article states and proves four, concrete, projective, central limit theorems. The results are known or suspected to be true by experts who are familiar with the more general central limit theorem for convex bodies, and related theory. Here we consider only four types of high dimensional geometric objects: spheres, balls, cubes, and boundaries of cubes. Each is capable of transforming uniform random variables into normal random variables through projection. This paper has been written to introduce new proof techniques, demonstrate how statistical simulation can be applied to geometry, and to build a foundation upon which recreational research projects can be built. The goal is to give the reader a better understanding of some of the mathematics at the juncture of probability theory, analysis, and geometry in high dimensions.
The purpose of this study is to investigate the health impact of air conditioning use during the ... more The purpose of this study is to investigate the health impact of air conditioning use during the 2018 heatwave in South Korea, which is the warmest in Korean weather observation history. Methods: Participants in this study were 1,000 adults aged >18 who were recruited from across Korea. Participants were asked about the symptoms of various diseases, air-conditioningitis, food poisoning, and whether they had any absences, lateness, or canceling appointments during the heatwave. About their air conditioning use, we questioned whether air conditioning was used in main living space excluding home, had air conditioning at home, and could not use air conditioning because of the electric charges when air conditioning is necessary. Association between air conditioning use and health impacts during the heatwave was analyzed by multiple logistic regression model. Potential confounding factors included age, sex, residential area, occupation, type of house, number of family members, prevalence of chronic diseases, and monthly household income. Results: When air conditioner was not operated in main living space excluding home, odds ratio (OR) of experiencing symptoms related to cardiovascular diseases and neurological diseases was 8.53, and 2.02, respectively. In the absence of air conditioner at home, OR of experiencing symptoms related to neurological diseases and canceling appointments was 1.64, and 3.43, respectively. When electric charges limit air conditioning use where it is necessary, OR of experiencing symptoms of all diseases, skin diseases, digestive diseases, kidney and urinary diseases, nervous system diseases, mental diseases, sleep disorders, food poisoning, lateness, and canceling appointments was 1.
When studying the causal effect of x on y, researchers may conduct regression and report a confid... more When studying the causal effect of x on y, researchers may conduct regression and report a confidence interval for the slope coefficient βx. This common confidence interval provides an assessment of uncertainty from sampling error, but it does not assess uncertainty from confounding. An intervention on x may produce a response in y that is unexpected, and our misinterpretation of the slope happens when there are confounding factors w. When w are measured we may conduct multiple regression, but when w are unmeasured it is common practice to include a precautionary statement when reporting the confidence interval, warning against unwarranted causal interpretation. If the goal is robust causal interpretation then we can do something more informative. Uncertainty in the specification of three confounding parameters can be propagated through an equation to produce a confounding interval. Here we develop supporting mathematical theory and describe an example application. Our proposed methodology applies well to studies of a continuous response or rare outcome. It is a general method for quantifying error from model uncertainty. Whereas confidence intervals are used to assess uncertainty from unmeasured individuals, confounding intervals can be used to assess uncertainty from unmeasured attributes.
When a linear model is adjusted to control for additional explanatory variables the sign of a fit... more When a linear model is adjusted to control for additional explanatory variables the sign of a fitted coefficient may reverse. Here these reversals are studied using coefficients of determination. The resulting theory can be used to determine directions of unique effects in the presence of substantial model uncertainty. This process is called model-independent estimation when the estimates are invariant across changes to the model structure. When a single covariate is added, the reversal region can be understood geometrically as an elliptical cone of two nappes with an axis of symmetry relating to a best-possible condition for a reversal using a single coefficient of determination. When a set of covariates are added to a model with a single explanatory variable, model-independent estimation can be implemented using subject matter knowledge. More general theory with partial coefficients is applicable to analysis of large data sets. Applications are demonstrated with dietary health dat...
The purpose of this dissertation was to create a textbook that supplements traditional statistics... more The purpose of this dissertation was to create a textbook that supplements traditional statistics curriculum. Emphasis was placed on statistical computing. Multiple computing environments were used to demonstrate essential skills for the applied statistican. Mathematical theory was developed in order to make the text self contained. A technique was developed to replace a faulty command for computing moment generating functions within Mathematica.
This work is copyrighted by Università del Salento, and is licensed under a Creative Commons Attr... more This work is copyrighted by Università del Salento, and is licensed under a Creative Commons Attribuzione-Non commerciale-Non opere derivate 3.0 Italia License. For more information see: http://creativecommons.org/licenses/by-nc-nd/3.0/it/ An ordinary least squares regression estimate for the slope, regardless of its strength, can have its sign reversed through adjustment for a random confounding vector of data. The assumption of a rotationally invariant distribution , on the space of centered, random, confounding vectors of data, makes calculation of probabilities for these reversals possible. Here, as the sample size increases, these probabilities are shown to decrease exponentially. This analytic result leads to some asymptotic comparison between regular sampling error and the error due to a mis-specified model.
Spurious association arises from covariance between propensity for the treatment and individual r... more Spurious association arises from covariance between propensity for the treatment and individual risk for the outcome. For sensitivity analysis with stochastic counterfactuals we introduce a methodology to characterize uncertainty in causal inference from natural experiments and quasiexperiments. Our sensitivity parameters are standardized measures of variation in propensity and individual risk, and one minus their geometric mean is an intuitive measure of randomness in the data generating process. Within our latent propensity-risk model, we show how to compute from contingency table data a threshold, T , of sufficient randomness for causal inference. If the actual randomness of the data generating process exceeds this threshold then causal inference is warranted.
This expository article states and proves four, concrete, projective, central limit theorems. The... more This expository article states and proves four, concrete, projective, central limit theorems. The results are known or suspected to be true by experts who are familiar with the more general central limit theorem for convex bodies, and related theory. Here we consider only four types of high dimensional geometric objects: spheres, balls, cubes, and boundaries of cubes. Each is capable of transforming uniform random variables into normal random variables through projection. This paper has been written to introduce new proof techniques, demonstrate how statistical simulation can be applied to geometry, and to build a foundation upon which recreational research projects can be built. The goal is to give the reader a better understanding of some of the mathematics at the juncture of probability theory, analysis, and geometry in high dimensions.
Results of multiple regression analysis are often reported as if model uncertainty is not an issu... more Results of multiple regression analysis are often reported as if model uncertainty is not an issue. If, however, omitted-variable bias is a valid concern, then the results of this paper may apply. The main result is a simple theorem, roughly asserting that weak correlates can not reverse existing estimates. The contrapositive, in a special case, produces necessary conditions for the Yule–Simpson effect. Other applications are discussed, and a few counter examples are presented to demonstrate how confounding can occur when least expected.
An interesting problem in recreational mathematics is the three cups problem. It is a problem wit... more An interesting problem in recreational mathematics is the three cups problem. It is a problem without a solution. There are infinitely many, related, n-choose-r cup problems. Each of these is shown to be decidable, through the construction of an explicit algorithm. The algorithm is able to compute solutions for solvable problems. The algorithm is run for 0 r n 10 and the results are discussed.
An ordinary least squares regression estimate for the slope, regardless of its strength, can have... more An ordinary least squares regression estimate for the slope, regardless of its strength, can have its sign reversed through adjustment for a random confounding vector of data. The assumption of a rotionally invariant distribution, on the space of centered, random, confounding vectors of data, makes calculation of probabilities for these reversals possible. Here these probabilities are shown to decrease exponentially, as the sample size increases. This analytic result leads to some asymptotic comparison between regular sampling error and the error due to a mis-specified model.
An adjusted estimate may be opposite the original estimate. This paper presents necessary and suf... more An adjusted estimate may be opposite the original estimate. This paper presents necessary and sufficient conditions for such a reversal, in the context of linear modeling, where adjustment is obtained through considering additional explanatory data associated with lurking variables.
There are innitely many n for which the sectionfx2 R n : P xi = p n=12g intersects a subset of th... more There are innitely many n for which the sectionfx2 R n : P xi = p n=12g intersects a subset of the vertices of a centered, unit, n-cube. Whenever this happens the volume for the resulting (n 1)-dimensional polytope can be computed using combinatorics. In high dimensions these volumes converge to p 6=(e ), and a related sequence of rational numbers converges to e . Here we explicitly establish this concrete combinatorial sequence of rational approximations for e with a proof based on the local central limit theorem. Similar sequences could be constructed with related techniques based on the central limit theorem for convex sets.
Uploads
Papers by Brian Knaeble