Introduction In the previous two chapters, we discussed using the results of randomized trials an... more Introduction In the previous two chapters, we discussed using the results of randomized trials and observational studies to estimate treatment effects. We were primarily interested in measures of effect size and in problems with design (in randomized trials) and confounding (in observational studies) that could bias effect estimates. We did not spend much time considering the precision of our effect estimates or whether the apparent treatment effects could be a result of chance. The statistics used to help us with these questions − P-values and confidence intervals – are the subject of this chapter. No area in epidemiology and statistics is so widely misunderstood and mistaught. We cover a more sophisticated understanding of P-values and confidence intervals in this text because 1) it is right, 2) it is important, and 3) we think you can handle it. After all, you have survived three chapters (3, 4, and 8) on using the results of diagnostic tests and Bayes's Theorem to update a patient's probability of disease. So now you are poised to gain a Bayesian understanding of P-values and confidence intervals as well. We will give you a taste in this chapter; those wishing to explore these ideas in greater depth are encouraged to read an excellent series of articles on this topic by Steven Goodman. (Goodman 1999a; Goodman 1999b; Goodman 2001)
Most diagnostic tests are not dichotomous (negative or positive) but, rather, have a range of pos... more Most diagnostic tests are not dichotomous (negative or positive) but, rather, have a range of possible results (very negative to very positive). If the pretest probability of disease is high, the test result that prompts treatment should be any value that is even mildly positive. If the pretest probability of disease is low, the test result needed to justify treatment should be very positive. Simple decision rules that fix the cutpoint separating positive from negative test results do not take into account the individual patient’s pretest probability of disease. Allowing the cutpoint to change with the pretest probability of disease increases the value of the test. This is primarily an issue when the pretest probability of disease varies widely between patients and depends on characteristics that are not measured by the test. It remains an issue for decision rules based on multiple test results if these rules fail to account for important determinants of patient-specific risk. This tutorial demonstrates how the value of a diagnostic test depends on the ability to vary the cutpoint, using as an example the white blood cell count in febrile children at risk for bacteremia.
Journal of Physical Activity and Health, Jul 1, 2008
Interest in the quantification of physical activity is on the rise. Triaxial accelerometry has fr... more Interest in the quantification of physical activity is on the rise. Triaxial accelerometry has frequently been used; however, research on the reliability of these devices is limited. We examine the interunit and intraunit reliability of 22 RT3 triaxial accelerometers using a performance-documented laboratory agitator. The RT3 units were tested while moving in 2 directions (antero-posterior, medio-lateral) and speeds (150 and 275 RPM) on a shaker with simultaneous documented performance output for three 24-hour periods. Minimal shaker variance was recorded for all trials (coefficients of variation [CVs] < 0.52%). Our data demonstrate good reliability within RT3s (CVs < 1.81%) but poor reliability among the 22 units (CVs range = 9.5% to 34.7%). In longitudinal studies, each subject should use the same RT3 unit at each assessment. The use of multiple RT3 units in cross-sectional studies is not recommended because data interpretation would be compromised by the high between-unit variability.
Many clinical diagnostic tests, such as the joint fluid white blood cell count, produce results o... more Many clinical diagnostic tests, such as the joint fluid white blood cell count, produce results on a continuous scale, rather than a mere positive or negative. The accuracy of such tests is often reported as a positive and negative likelihood ratio at each of several potential cutoff points (e.g., ≥ 25,000/μL vs. not, ≥ 50,000/μL vs not; ≥ 100,000/μL vs. not). This Key Concepts article reviews the definition of a likelihood ratio and explains why the practice of dichotomizing the test is problematic. Instead, it proposes that such continuous scales be divided into multiple intervals (e.g., 0-25,000, >25,000-50,000, >50,000-100,000, >100,000) and each interval be given its own likelihood ratio. This practice does not only align with clinical common sense and practice, but also enables a more accurate estimate of the updated risk of disease, given a pre-test risk.
Introduction In the previous two chapters, we discussed using the results of randomized trials an... more Introduction In the previous two chapters, we discussed using the results of randomized trials and observational studies to estimate treatment effects. We were primarily interested in measures of effect size and in problems with design (in randomized trials) and confounding (in observational studies) that could bias effect estimates. We did not spend much time considering the precision of our effect estimates or whether the apparent treatment effects could be a result of chance. The statistics used to help us with these questions − P-values and confidence intervals – are the subject of this chapter. No area in epidemiology and statistics is so widely misunderstood and mistaught. We cover a more sophisticated understanding of P-values and confidence intervals in this text because 1) it is right, 2) it is important, and 3) we think you can handle it. After all, you have survived three chapters (3, 4, and 8) on using the results of diagnostic tests and Bayes's Theorem to update a patient's probability of disease. So now you are poised to gain a Bayesian understanding of P-values and confidence intervals as well. We will give you a taste in this chapter; those wishing to explore these ideas in greater depth are encouraged to read an excellent series of articles on this topic by Steven Goodman. (Goodman 1999a; Goodman 1999b; Goodman 2001)
Most diagnostic tests are not dichotomous (negative or positive) but, rather, have a range of pos... more Most diagnostic tests are not dichotomous (negative or positive) but, rather, have a range of possible results (very negative to very positive). If the pretest probability of disease is high, the test result that prompts treatment should be any value that is even mildly positive. If the pretest probability of disease is low, the test result needed to justify treatment should be very positive. Simple decision rules that fix the cutpoint separating positive from negative test results do not take into account the individual patient’s pretest probability of disease. Allowing the cutpoint to change with the pretest probability of disease increases the value of the test. This is primarily an issue when the pretest probability of disease varies widely between patients and depends on characteristics that are not measured by the test. It remains an issue for decision rules based on multiple test results if these rules fail to account for important determinants of patient-specific risk. This tutorial demonstrates how the value of a diagnostic test depends on the ability to vary the cutpoint, using as an example the white blood cell count in febrile children at risk for bacteremia.
Journal of Physical Activity and Health, Jul 1, 2008
Interest in the quantification of physical activity is on the rise. Triaxial accelerometry has fr... more Interest in the quantification of physical activity is on the rise. Triaxial accelerometry has frequently been used; however, research on the reliability of these devices is limited. We examine the interunit and intraunit reliability of 22 RT3 triaxial accelerometers using a performance-documented laboratory agitator. The RT3 units were tested while moving in 2 directions (antero-posterior, medio-lateral) and speeds (150 and 275 RPM) on a shaker with simultaneous documented performance output for three 24-hour periods. Minimal shaker variance was recorded for all trials (coefficients of variation [CVs] < 0.52%). Our data demonstrate good reliability within RT3s (CVs < 1.81%) but poor reliability among the 22 units (CVs range = 9.5% to 34.7%). In longitudinal studies, each subject should use the same RT3 unit at each assessment. The use of multiple RT3 units in cross-sectional studies is not recommended because data interpretation would be compromised by the high between-unit variability.
Many clinical diagnostic tests, such as the joint fluid white blood cell count, produce results o... more Many clinical diagnostic tests, such as the joint fluid white blood cell count, produce results on a continuous scale, rather than a mere positive or negative. The accuracy of such tests is often reported as a positive and negative likelihood ratio at each of several potential cutoff points (e.g., ≥ 25,000/μL vs. not, ≥ 50,000/μL vs not; ≥ 100,000/μL vs. not). This Key Concepts article reviews the definition of a likelihood ratio and explains why the practice of dichotomizing the test is problematic. Instead, it proposes that such continuous scales be divided into multiple intervals (e.g., 0-25,000, >25,000-50,000, >50,000-100,000, >100,000) and each interval be given its own likelihood ratio. This practice does not only align with clinical common sense and practice, but also enables a more accurate estimate of the updated risk of disease, given a pre-test risk.
Uploads
Papers by Michael Kohn