Gaussian Process is a non-parametric prior which can be understood as a distribution on the funct... more Gaussian Process is a non-parametric prior which can be understood as a distribution on the function space intuitively. It is known that by introducing appropriate prior to the weights of the neural networks, Gaussian Process can be obtained by taking the infinite-width limit of the Bayesian neural networks from a Bayesian perspective. In this paper, we explore the infinitely wide Tensor Networks and show the equivalence of the infinitely wide Tensor Networks and the Gaussian Process. We study the pure Tensor Network and another two extended Tensor Network structures: Neural Kernel Tensor Network and Tensor Network hidden layer Neural Network and prove that each one will converge to the Gaussian Process as the width of each model goes to infinity. (We note here that Gaussian Process can also be obtained by taking the infinite limit of at least one of the bond dimensions αi in the product of the chain of tensor nodes, and the proofs can be done with the same ideas in the proofs of th...
Journal of Business & Economic Statistics, 2016
Randomized controlled trials play an important role in how Internet companies predict the impact ... more Randomized controlled trials play an important role in how Internet companies predict the impact of policy decisions and product changes. In these 'digital experiments', different units (people, devices, products) respond differently to the treatment. This article presents a fast and scalable Bayesian nonparametric analysis of such heterogeneous treatment effects and their measurement in relation to observable covariates. New results and algorithms are provided for quantifying the uncertainty associated with treatment effect measurement via both linear projections and nonlinear regression trees (CART and Random Forests). For linear projections, our inference strategy leads to results that are mostly in agreement with those from the frequentist literature. We find that linear regression adjustment of treatment effect averages (i.e., post-stratification) can provide some variance reduction, but that this reduction will be vanishingly small in the low-signal and large-sample setting of digital experiments. For regression trees, we provide uncertainty quantification for the machine learning algorithms that are commonly applied in tree-fitting. We argue that practitioners should look to ensembles of trees (forests) rather than individual trees in their analysis. The ideas are applied on and illustrated through an example experiment involving 21 million unique users of EBay.com. Taddy is also a research fellow at EBay. The authors thank others at EBay who have contributed, especially Jay Weiler who assisted in data collection.
Using R for predictive model checking Multinomial Model conjugate analysis and its application in... more Using R for predictive model checking Multinomial Model conjugate analysis and its application in R Comparing classical and Bayesian multiparameter models Multivariate Normal Models Complex Contingency Tables Example: Regression modeling with change point On the night of January 27, 1986, the night before the space shuttle Challenger accident, there was a tree-hour teleconference among people at Marton Thiokol
Prior research on pair programming has found that compared to students who work alone, students w... more Prior research on pair programming has found that compared to students who work alone, students who pair have shown increased confidence in their work, greater success in CS1, and greater retention in computer-related majors. In these earlier studies, pairing and solo students were not given the same programming assignments. This paper reports on a study in which this factor was controlled by giving the same programming assignments to pairing and solo students. We found that pairing students were more likely to turn in working programs, and these programs correctly implemented more required features. Our findings were mixed when we looked at some standard complexity measures of programs. An unexpected but significant finding was that pairing students were more likely to submit solutions to their programming assignments.
Erich Lehmann's contributions to mathematical statistics have been both broad and deep, and his i... more Erich Lehmann's contributions to mathematical statistics have been both broad and deep, and his influence on applied statistics has been more widespread than perhaps even he knew. As an example, I used Google scholar to look at the citation patterns of Chernoff and Lehmann (1954), one of the articles I discuss below. As of this writing (November 2011), this paper has been cited • quite steadily from 1954 to the present, in fact a total of 327 times, by workers in archaeology,
Objectives: To develop a risk-adjustment methodology that maximizes the use of automated physiolo... more Objectives: To develop a risk-adjustment methodology that maximizes the use of automated physiology and diagnosis data from the time period preceding hospitalization. Design: Retrospective cohort study using split-validation and logistic regression. Setting: Seventeen hospitals in a large integrated health care delivery system. Subjects: Patients (n ϭ 259,699) hospitalized between January 2002 and June 2005. Main Outcome Measures: Inpatient and 30-day mortality. Results: Inpatient mortality was 3.50%; 30-day mortality was 4.06%. We tested logistic regression models in a randomly chosen derivation dataset consisting of 50% of the records and applied their coefficients to the validation dataset. The final model included sex, age, admission type, admission diagnosis, a Laboratory-based Acute Physiology Score (LAPS), and a COmorbidity Point Score (COPS). The LAPS integrates information from 14 laboratory tests obtained in the 24 hours preceding hospitalization into a single continuous variable. Using Diagnostic Cost Groups software, we categorized patients as having up to 40 different comorbidities based on outpatient and inpatient data from the 12 months preceding hospitalization. The COPS integrates information regarding these 41 comorbidities into a single continuous variable. Our best model for inpatient mortality had a c statistic of 0.88 in the validation dataset, whereas the c statistic for 30-day mortality was 0.86; both models had excellent calibration. Physiologic data accounted for a substantial proportion of the model's predictive ability. Conclusion: Efforts to support improvement of hospital outcomes can take advantage of risk-adjustment methods based on automated physiology and diagnosis data that are not confounded by information obtained after hospital admission.
Insertional mutations leading to expansion of the octarepeat domain of the prion protein (PrP) ar... more Insertional mutations leading to expansion of the octarepeat domain of the prion protein (PrP) are directly linked to prion disease. While normal PrP has four PHGGGWGQ octapeptide segments in its flexible N-terminal domain, expanded forms may have up to nine additional octapeptide inserts. The type of prion disease segregates with the degree of expansion. With up to four extra octarepeats, the average onset age is above 60 years, whereas five to nine extra octarepeats results in an average onset age between 30 and 40 years, a difference of almost three decades. In wild-type PrP, the octarepeat domain takes up copper (Cu(2+)) and is considered essential for in vivo function. Work from our lab demonstrates that the copper coordination mode depends on the precise ratio of Cu(2+) to protein. At low Cu(2+) levels, coordination involves histidine side chains from adjacent octarepeats, whereas at high levels each repeat takes up a single copper ion through interactions with the histidine s...
OBJECTIVE: To define a quantitative stratification algorithm for the risk of early-onset sepsis (... more OBJECTIVE: To define a quantitative stratification algorithm for the risk of early-onset sepsis (EOS) in newborns ≥34 weeks’ gestation. METHODS: We conducted a retrospective nested case-control study that used split validation. Data collected on each infant included sepsis risk at birth based on objective maternal factors, demographics, specific clinical milestones, and vital signs during the first 24 hours after birth. Using a combination of recursive partitioning and logistic regression, we developed a risk classification scheme for EOS on the derivation dataset. This scheme was then applied to the validation dataset. RESULTS: Using a base population of 608 014 live births ≥34 weeks’ gestation at 14 hospitals between 1993 and 2007, we identified all 350 EOS cases <72 hours of age and frequency matched them by hospital and year of birth to 1063 controls. Using maternal and neonatal data, we defined a risk stratification scheme that divided the neonatal population into 3 groups: ...
Objective-To measure the influence of varying mortality time frames on performance rankings among... more Objective-To measure the influence of varying mortality time frames on performance rankings among regional NICUs in a large state. Study design-We carried out cross-sectional data analysis of VLBW infants cared for at 24 level 3 NICUs. We tested the effect of four definitions of mortality: (1) deaths between admission and end of birth hospitalization or up to 366 days; (2) deaths between 12 hours of age and end of birth hospitalization or up to 366 days; (3) deaths between admission and 28 days; and (4) deaths between 12 hours of age and 28 days. NICUs were ranked by quantifying their deviation from risk-adjusted expected mortality and splitting them into three tiers: top six, bottom six, and inbetween. Results-There was wide inter-institutional variation in risk-adjusted mortality for each definition (observed minus expected Z-score range,-6.08 to 3.75). However, mortality-rate-based
We review three leading stochastic optimization methods-simulated annealing, genetic algorithms, ... more We review three leading stochastic optimization methods-simulated annealing, genetic algorithms, and tabu search. In each case we analyze the method, give the exact algorithm, detail advantages and disadvantages, and summarize the literature on optimal values of the inputs. As a motivating example we describe the solution-using Bayesian decision theory, via maximization of expected utility-of a variable selection problem in generalized linear models, which arises in the cost-effective construction of a patient sickness-at-admission scale as part of an effort to measure quality of hospital care.
Statistics is the study of uncertainty: how to measure it and what to do about it. Uncertainty it... more Statistics is the study of uncertainty: how to measure it and what to do about it. Uncertainty itself, as people regard it today, is a fairly new concept in the history of ideas. For example, probability-the branch of mathematics devoted to quantifying uncertainty-dates only from the middle 1600s, brought to life through disputes about how to wager fairly in games of chance. Since then, two main ways to give meaning to the concept of probability have arisen and, to some extent, fought with each other: the frequentist and Bayesian approaches:
CASE 4.3 SCENARIO APPROACH 4.3.1 Macro-and Micro-Scenarios 4.3.2 Micro-Scenarios: structural assu... more CASE 4.3 SCENARIO APPROACH 4.3.1 Macro-and Micro-Scenarios 4.3.2 Micro-Scenarios: structural assumptions 4.4 SCENARIO PROPERTIES 4.4.1 Scenario Probabilities 4.4.2 Physico-chemical reactions versus scenarios 4.4.3 Level E/G Data Set 4.5 RUNNING THE TEST CASE 4.5.1 Simulations performed 4.6 UNCERTAINTY FRAMEWORK IN GESAMAC 4.6.1 Uncertainty calculations 4.6.2 Challenges to the Bayesian approach 4.6.3 A Model Uncertainty Audit 4.7 SENSITIVITY ANALYSIS 4.7.1 Initial SA results for maximum dose 4.7.2 SA results for total annual dose in the REF scenario 4.8 MC DRIVER 5 CONCLUSIONS 6 REFERENCES 6.1 LITERATURE CONTRIBUTED BY THE GESAMAC PROJECT 6.2 OTHER REFERENCES GEsophere Modelling Sensitivity Analysis Model Uncertainty Parallel MC Driver 3 METHODOLOGY The methodology proposed by GESAMAC basically consists in the MC simulation of a system model (in this case a radioactive waste disposal system) in which global SA and UA are applied over the model outputs. The following sections summarise (1) each separate aspect of the work performed and (2) how they have been integrated through a hypothetical test case. 3.1 GEOSPHERE MODELLING This area of work concerns models of radionuclide migration by taking into account the solid-liquid interaction phenomena through the migration process. It has investigated the groundwater transport of nuclides through a multilayer system in one and two dimensions, and has studied the effects of parameters and model assumptions on the predictions of the model. The model concentrates on those model assumptions that relate to the physico-chemical reactions between the liquid and solid phases. In order to weight the impact of different conceptual assumptions, a model has been implemented to describe the liquid phase-solid phase interaction. The uncertainties in model parameters and model assumptions have been investigated by a combination of Monte Carlo methodologies and Bayesian logic, which allows for uncertainty and sensitivity analysis. 1 Uniform 300 3000 RET21 Retardation coefficient 2 in layer 1 Uniform 30 300 RET31 Retardation coefficient 3 in layer 1 Uniform 300 3000 VREAL2 Geosphere water travel velocity in layer 2 Log-Uniform 1E-2 1E-1 XPATH2 Geosphere layer 2 length Uniform 50 200 RET12 Retardation coefficient 1 in layer 2 Uniform 300 3000 RET22 Retardation coefficient 2 in layer 2 Uniform 30 300 RET32 Retardation coefficient 3 in layer 2 Uniform 300 3000 STREAM Stream Flow rate Log-Uniform 1E+5 1E+7 Table 4.7.4. Example of parametric inputs to GTMCHEM in the simulation study for the radionuclide chain: Reference scenario.
No part of this document may be reproduced or transmitted in any form or by any means, electronic... more No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, for any purpose other than the owner's personal use, without the prior written permission of one of the copyright holders.
This paper estimates the parameters of a stylized dynamic stochastic general equilibrium model us... more This paper estimates the parameters of a stylized dynamic stochastic general equilibrium model using maximum likelihood and Bayesian methods, paying special attention to the issue of weak parameter identification. Given the model and the available data, the posterior estimates of the weakly identified parameters are very sensitive to the choice of priors. We provide a set of tools to diagnose weak identification, which include surface plots of the log-likelihood as a function of two parameters, heat plots of the log-likelihood as a function of three parameters, Monte Carlo simulations using artificial data, and Bayesian estimation using three sets of priors. We find that the policy coefficients and the parameter governing the elasticity of labor supply are weakly identified by the data, and posterior predictive distributions remind us that DSGE models may make poor forecasts even when they fit the data well. Although parameter identification is model-and dataspecific, the lack of identification of some key structural parameters in a small-scale DSGE model such as the one we examine should raise a red flag to researchers trying to estimate-and draw valid inferences from-large-scale models featuring many more parameters.
In groundwater contamination studies, uncertainties are a constant presence. We have in previous ... more In groundwater contamination studies, uncertainties are a constant presence. We have in previous work classified the different sources of uncertainty one can encounter in such studies [www.ams.ucsc.edu/~draper/writings.html items 43 and 55], and we have proposed a framework to tackle them involving four hierarchical layers of uncertainty: * Scenario (there may be uncertainty about relevant inputs to the physical process under study), * Structure (conditional on scenario, the precise mathematical form of the best model to capture all relevant physical processes (advection, diffusion, ...) may not be known, * Parametric (conditional on scenario and structure, the model will typically have one or more unknown physical constants that need to be estimated from data), and * Predictive (conditional on scenario, structure, and parameters, the model predictions may still not agree perfectly with the observed data). We have been developing work on all of these types of uncertainty and the pre...
When the goal is inference about an unknown and prediction of future data D on the basis of data ... more When the goal is inference about an unknown and prediction of future data D on the basis of data D and background assumptions/judgments B, the process of Bayesian model specication involves two ingredients: the condi- tional probability distributions p( jB) and p(Dj ;B). Here we focus on specifying p(Dj ;B), and we argue that calibration considerations | paying attention to how often You get the right answer | should be an integral part of this speci- cation process. After contrasting Bayes-factor-based and predictive model-choice criteria, we present some calibration results, in xed- and random-eects Poisson models, relevant to addressing two of the basic questions that arise in Bayesian model specication: (Q1) Is model Mj better than Mj0? and (Q2) Is model Mj good enough? In particular, we show that LSF S, a full-sample log score predictive model-choice criterion, has better small-sample model discrimination performance than either DIC or a cross-validation-style log-scoring crite...
Gaussian Process is a non-parametric prior which can be understood as a distribution on the funct... more Gaussian Process is a non-parametric prior which can be understood as a distribution on the function space intuitively. It is known that by introducing appropriate prior to the weights of the neural networks, Gaussian Process can be obtained by taking the infinite-width limit of the Bayesian neural networks from a Bayesian perspective. In this paper, we explore the infinitely wide Tensor Networks and show the equivalence of the infinitely wide Tensor Networks and the Gaussian Process. We study the pure Tensor Network and another two extended Tensor Network structures: Neural Kernel Tensor Network and Tensor Network hidden layer Neural Network and prove that each one will converge to the Gaussian Process as the width of each model goes to infinity. (We note here that Gaussian Process can also be obtained by taking the infinite limit of at least one of the bond dimensions αi in the product of the chain of tensor nodes, and the proofs can be done with the same ideas in the proofs of th...
Journal of Business & Economic Statistics, 2016
Randomized controlled trials play an important role in how Internet companies predict the impact ... more Randomized controlled trials play an important role in how Internet companies predict the impact of policy decisions and product changes. In these 'digital experiments', different units (people, devices, products) respond differently to the treatment. This article presents a fast and scalable Bayesian nonparametric analysis of such heterogeneous treatment effects and their measurement in relation to observable covariates. New results and algorithms are provided for quantifying the uncertainty associated with treatment effect measurement via both linear projections and nonlinear regression trees (CART and Random Forests). For linear projections, our inference strategy leads to results that are mostly in agreement with those from the frequentist literature. We find that linear regression adjustment of treatment effect averages (i.e., post-stratification) can provide some variance reduction, but that this reduction will be vanishingly small in the low-signal and large-sample setting of digital experiments. For regression trees, we provide uncertainty quantification for the machine learning algorithms that are commonly applied in tree-fitting. We argue that practitioners should look to ensembles of trees (forests) rather than individual trees in their analysis. The ideas are applied on and illustrated through an example experiment involving 21 million unique users of EBay.com. Taddy is also a research fellow at EBay. The authors thank others at EBay who have contributed, especially Jay Weiler who assisted in data collection.
Using R for predictive model checking Multinomial Model conjugate analysis and its application in... more Using R for predictive model checking Multinomial Model conjugate analysis and its application in R Comparing classical and Bayesian multiparameter models Multivariate Normal Models Complex Contingency Tables Example: Regression modeling with change point On the night of January 27, 1986, the night before the space shuttle Challenger accident, there was a tree-hour teleconference among people at Marton Thiokol
Prior research on pair programming has found that compared to students who work alone, students w... more Prior research on pair programming has found that compared to students who work alone, students who pair have shown increased confidence in their work, greater success in CS1, and greater retention in computer-related majors. In these earlier studies, pairing and solo students were not given the same programming assignments. This paper reports on a study in which this factor was controlled by giving the same programming assignments to pairing and solo students. We found that pairing students were more likely to turn in working programs, and these programs correctly implemented more required features. Our findings were mixed when we looked at some standard complexity measures of programs. An unexpected but significant finding was that pairing students were more likely to submit solutions to their programming assignments.
Erich Lehmann's contributions to mathematical statistics have been both broad and deep, and his i... more Erich Lehmann's contributions to mathematical statistics have been both broad and deep, and his influence on applied statistics has been more widespread than perhaps even he knew. As an example, I used Google scholar to look at the citation patterns of Chernoff and Lehmann (1954), one of the articles I discuss below. As of this writing (November 2011), this paper has been cited • quite steadily from 1954 to the present, in fact a total of 327 times, by workers in archaeology,
Objectives: To develop a risk-adjustment methodology that maximizes the use of automated physiolo... more Objectives: To develop a risk-adjustment methodology that maximizes the use of automated physiology and diagnosis data from the time period preceding hospitalization. Design: Retrospective cohort study using split-validation and logistic regression. Setting: Seventeen hospitals in a large integrated health care delivery system. Subjects: Patients (n ϭ 259,699) hospitalized between January 2002 and June 2005. Main Outcome Measures: Inpatient and 30-day mortality. Results: Inpatient mortality was 3.50%; 30-day mortality was 4.06%. We tested logistic regression models in a randomly chosen derivation dataset consisting of 50% of the records and applied their coefficients to the validation dataset. The final model included sex, age, admission type, admission diagnosis, a Laboratory-based Acute Physiology Score (LAPS), and a COmorbidity Point Score (COPS). The LAPS integrates information from 14 laboratory tests obtained in the 24 hours preceding hospitalization into a single continuous variable. Using Diagnostic Cost Groups software, we categorized patients as having up to 40 different comorbidities based on outpatient and inpatient data from the 12 months preceding hospitalization. The COPS integrates information regarding these 41 comorbidities into a single continuous variable. Our best model for inpatient mortality had a c statistic of 0.88 in the validation dataset, whereas the c statistic for 30-day mortality was 0.86; both models had excellent calibration. Physiologic data accounted for a substantial proportion of the model's predictive ability. Conclusion: Efforts to support improvement of hospital outcomes can take advantage of risk-adjustment methods based on automated physiology and diagnosis data that are not confounded by information obtained after hospital admission.
Insertional mutations leading to expansion of the octarepeat domain of the prion protein (PrP) ar... more Insertional mutations leading to expansion of the octarepeat domain of the prion protein (PrP) are directly linked to prion disease. While normal PrP has four PHGGGWGQ octapeptide segments in its flexible N-terminal domain, expanded forms may have up to nine additional octapeptide inserts. The type of prion disease segregates with the degree of expansion. With up to four extra octarepeats, the average onset age is above 60 years, whereas five to nine extra octarepeats results in an average onset age between 30 and 40 years, a difference of almost three decades. In wild-type PrP, the octarepeat domain takes up copper (Cu(2+)) and is considered essential for in vivo function. Work from our lab demonstrates that the copper coordination mode depends on the precise ratio of Cu(2+) to protein. At low Cu(2+) levels, coordination involves histidine side chains from adjacent octarepeats, whereas at high levels each repeat takes up a single copper ion through interactions with the histidine s...
OBJECTIVE: To define a quantitative stratification algorithm for the risk of early-onset sepsis (... more OBJECTIVE: To define a quantitative stratification algorithm for the risk of early-onset sepsis (EOS) in newborns ≥34 weeks’ gestation. METHODS: We conducted a retrospective nested case-control study that used split validation. Data collected on each infant included sepsis risk at birth based on objective maternal factors, demographics, specific clinical milestones, and vital signs during the first 24 hours after birth. Using a combination of recursive partitioning and logistic regression, we developed a risk classification scheme for EOS on the derivation dataset. This scheme was then applied to the validation dataset. RESULTS: Using a base population of 608 014 live births ≥34 weeks’ gestation at 14 hospitals between 1993 and 2007, we identified all 350 EOS cases <72 hours of age and frequency matched them by hospital and year of birth to 1063 controls. Using maternal and neonatal data, we defined a risk stratification scheme that divided the neonatal population into 3 groups: ...
Objective-To measure the influence of varying mortality time frames on performance rankings among... more Objective-To measure the influence of varying mortality time frames on performance rankings among regional NICUs in a large state. Study design-We carried out cross-sectional data analysis of VLBW infants cared for at 24 level 3 NICUs. We tested the effect of four definitions of mortality: (1) deaths between admission and end of birth hospitalization or up to 366 days; (2) deaths between 12 hours of age and end of birth hospitalization or up to 366 days; (3) deaths between admission and 28 days; and (4) deaths between 12 hours of age and 28 days. NICUs were ranked by quantifying their deviation from risk-adjusted expected mortality and splitting them into three tiers: top six, bottom six, and inbetween. Results-There was wide inter-institutional variation in risk-adjusted mortality for each definition (observed minus expected Z-score range,-6.08 to 3.75). However, mortality-rate-based
We review three leading stochastic optimization methods-simulated annealing, genetic algorithms, ... more We review three leading stochastic optimization methods-simulated annealing, genetic algorithms, and tabu search. In each case we analyze the method, give the exact algorithm, detail advantages and disadvantages, and summarize the literature on optimal values of the inputs. As a motivating example we describe the solution-using Bayesian decision theory, via maximization of expected utility-of a variable selection problem in generalized linear models, which arises in the cost-effective construction of a patient sickness-at-admission scale as part of an effort to measure quality of hospital care.
Statistics is the study of uncertainty: how to measure it and what to do about it. Uncertainty it... more Statistics is the study of uncertainty: how to measure it and what to do about it. Uncertainty itself, as people regard it today, is a fairly new concept in the history of ideas. For example, probability-the branch of mathematics devoted to quantifying uncertainty-dates only from the middle 1600s, brought to life through disputes about how to wager fairly in games of chance. Since then, two main ways to give meaning to the concept of probability have arisen and, to some extent, fought with each other: the frequentist and Bayesian approaches:
CASE 4.3 SCENARIO APPROACH 4.3.1 Macro-and Micro-Scenarios 4.3.2 Micro-Scenarios: structural assu... more CASE 4.3 SCENARIO APPROACH 4.3.1 Macro-and Micro-Scenarios 4.3.2 Micro-Scenarios: structural assumptions 4.4 SCENARIO PROPERTIES 4.4.1 Scenario Probabilities 4.4.2 Physico-chemical reactions versus scenarios 4.4.3 Level E/G Data Set 4.5 RUNNING THE TEST CASE 4.5.1 Simulations performed 4.6 UNCERTAINTY FRAMEWORK IN GESAMAC 4.6.1 Uncertainty calculations 4.6.2 Challenges to the Bayesian approach 4.6.3 A Model Uncertainty Audit 4.7 SENSITIVITY ANALYSIS 4.7.1 Initial SA results for maximum dose 4.7.2 SA results for total annual dose in the REF scenario 4.8 MC DRIVER 5 CONCLUSIONS 6 REFERENCES 6.1 LITERATURE CONTRIBUTED BY THE GESAMAC PROJECT 6.2 OTHER REFERENCES GEsophere Modelling Sensitivity Analysis Model Uncertainty Parallel MC Driver 3 METHODOLOGY The methodology proposed by GESAMAC basically consists in the MC simulation of a system model (in this case a radioactive waste disposal system) in which global SA and UA are applied over the model outputs. The following sections summarise (1) each separate aspect of the work performed and (2) how they have been integrated through a hypothetical test case. 3.1 GEOSPHERE MODELLING This area of work concerns models of radionuclide migration by taking into account the solid-liquid interaction phenomena through the migration process. It has investigated the groundwater transport of nuclides through a multilayer system in one and two dimensions, and has studied the effects of parameters and model assumptions on the predictions of the model. The model concentrates on those model assumptions that relate to the physico-chemical reactions between the liquid and solid phases. In order to weight the impact of different conceptual assumptions, a model has been implemented to describe the liquid phase-solid phase interaction. The uncertainties in model parameters and model assumptions have been investigated by a combination of Monte Carlo methodologies and Bayesian logic, which allows for uncertainty and sensitivity analysis. 1 Uniform 300 3000 RET21 Retardation coefficient 2 in layer 1 Uniform 30 300 RET31 Retardation coefficient 3 in layer 1 Uniform 300 3000 VREAL2 Geosphere water travel velocity in layer 2 Log-Uniform 1E-2 1E-1 XPATH2 Geosphere layer 2 length Uniform 50 200 RET12 Retardation coefficient 1 in layer 2 Uniform 300 3000 RET22 Retardation coefficient 2 in layer 2 Uniform 30 300 RET32 Retardation coefficient 3 in layer 2 Uniform 300 3000 STREAM Stream Flow rate Log-Uniform 1E+5 1E+7 Table 4.7.4. Example of parametric inputs to GTMCHEM in the simulation study for the radionuclide chain: Reference scenario.
No part of this document may be reproduced or transmitted in any form or by any means, electronic... more No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, for any purpose other than the owner's personal use, without the prior written permission of one of the copyright holders.
This paper estimates the parameters of a stylized dynamic stochastic general equilibrium model us... more This paper estimates the parameters of a stylized dynamic stochastic general equilibrium model using maximum likelihood and Bayesian methods, paying special attention to the issue of weak parameter identification. Given the model and the available data, the posterior estimates of the weakly identified parameters are very sensitive to the choice of priors. We provide a set of tools to diagnose weak identification, which include surface plots of the log-likelihood as a function of two parameters, heat plots of the log-likelihood as a function of three parameters, Monte Carlo simulations using artificial data, and Bayesian estimation using three sets of priors. We find that the policy coefficients and the parameter governing the elasticity of labor supply are weakly identified by the data, and posterior predictive distributions remind us that DSGE models may make poor forecasts even when they fit the data well. Although parameter identification is model-and dataspecific, the lack of identification of some key structural parameters in a small-scale DSGE model such as the one we examine should raise a red flag to researchers trying to estimate-and draw valid inferences from-large-scale models featuring many more parameters.
In groundwater contamination studies, uncertainties are a constant presence. We have in previous ... more In groundwater contamination studies, uncertainties are a constant presence. We have in previous work classified the different sources of uncertainty one can encounter in such studies [www.ams.ucsc.edu/~draper/writings.html items 43 and 55], and we have proposed a framework to tackle them involving four hierarchical layers of uncertainty: * Scenario (there may be uncertainty about relevant inputs to the physical process under study), * Structure (conditional on scenario, the precise mathematical form of the best model to capture all relevant physical processes (advection, diffusion, ...) may not be known, * Parametric (conditional on scenario and structure, the model will typically have one or more unknown physical constants that need to be estimated from data), and * Predictive (conditional on scenario, structure, and parameters, the model predictions may still not agree perfectly with the observed data). We have been developing work on all of these types of uncertainty and the pre...
When the goal is inference about an unknown and prediction of future data D on the basis of data ... more When the goal is inference about an unknown and prediction of future data D on the basis of data D and background assumptions/judgments B, the process of Bayesian model specication involves two ingredients: the condi- tional probability distributions p( jB) and p(Dj ;B). Here we focus on specifying p(Dj ;B), and we argue that calibration considerations | paying attention to how often You get the right answer | should be an integral part of this speci- cation process. After contrasting Bayes-factor-based and predictive model-choice criteria, we present some calibration results, in xed- and random-eects Poisson models, relevant to addressing two of the basic questions that arise in Bayesian model specication: (Q1) Is model Mj better than Mj0? and (Q2) Is model Mj good enough? In particular, we show that LSF S, a full-sample log score predictive model-choice criterion, has better small-sample model discrimination performance than either DIC or a cross-validation-style log-scoring crite...
Uploads
Papers by David Draper