PEDIATRICS Vol. 115 No. 4 April 2005, pp. 1113-1114 (doi:10.1542/peds.2005-0163)
Can the Results Be Believed?: In Reply
David Olds, PhDDepartment of Pediatrics
Dennis Luckey, PhD
Department of Preventive Medicine and Biometrics
University of Colorado Health Sciences Center
Denver, CO 80262
Charles Henderson
Human Development and Family Studies
Cornell University
Ithaca, New York, 14850
Dr Glauber raises a common challenge to trials that examine a wide range of outcomes but do not make Bonferroni-like adjustments for multiple comparisons. We share Dr Glauber's concern about overinterpreting single statistically significant treatment differences. We have taken the position from the beginning of this program of research, however, that we generally would not make statistical adjustments for multiple comparisons. Instead, each of the 3 trials of this nurse home-visitor program has been guided by specific hypotheses grounded in a theoretical model; we treat with skepticism each treatment effect that meets the threshold of conventional statistical significance unless it coheres with other findings within the trial and with other relevant findings outside of the trial.1
Moreover, we have taken the position that findings from a single trial are an insufficient foundation for guiding policy or practice.1 Instead of making adjustments to P values, we have focused on conducting replication trials of the intervention with different populations across different contexts and time points in history. Questions of replicability, in our view, are of substantial importance from the standpoint of science and policy. By testing the intervention in this way, we address Dr Glauber's concerns about whether the findings from any single trial are simply sampling artifacts while simultaneously addressing the issue of generalizability of the findings.
We acknowledge that this strategy has its limitations; for example, the failure of the intervention to produce effects in different populations may have to do with different population-based patterns of morbidity and risks and protective factors. However, its advantages, in our view, outweigh its disadvantages. Although lack of total comparability of measures across trials makes the examination of cross-trial consistency challenging, it nevertheless is worth examining. In the original trial, several outcomes that we hypothesized would improve with the intervention (eg, immunization rates, failure to thrive, obesity, maternal symptoms of depression) were not affected.24 Nevertheless, we examined these same outcomes in the second trial because we were concerned that we might miss important program effects because of variations in population or contextual factors (type II errors).5,6 On the other hand, there is intertrial consistency in program effects on a number of other important outcomes such as prenatal tobacco use, childhood injuries, children's preschool language and cognitive functioning, rates and timing of subsequent pregnancy, and maternal use of welfare.110
By examining the same set of outcomes in
2 trials, we are able to make more informed judgments about program effectiveness. Rather than penalizing ourselves for examining a multiplicity of outcomes, by using this replication strategy we gain greater assurance about the degree to which the program does and does not affect particular outcomes. For those outcomes that the program does affect, we can begin to gain estimates of effect sizes in which we have confidence. The approach we have taken is supported by epidemiologists who question the need for making statistical adjustments for multiple comparisons and an overreliance on P values as a basis for making inferences about intervention effects.1114
Using this strategy, we have cautioned readers about statistically significant effects that in our judgment seemed either implausible or that were not sufficiently consistent with other treatment effects. Our recent article cited by Dr Glauber highlights this point.10 We reported a nurse-visited versus control-group difference in the rate of domestic violence but cautioned the readers about the replicability of this finding and will not place much stock in it until it is replicated in a second trial. From the beginning of our research, we have resisted pressure to use data from the first trial to promote public investment in the program until results from the second trial replicated many of its central findings.
We are not opposed to making adjustments for multiple comparisons in situations for which there are less well-established hypotheses and open-ended exploration of secondary data. There also are specific types of analyses for which we have consistently adjusted probability levels for simultaneity. For example, in a number of our articles we have shown regions of simultaneous significance indicating the range of values of a covariate over which treatment differences obtain.1,2,15
Readers who examine just 1 or 2 of the reports of these trials are not likely to appreciate the degree to which findings are consistent across trials. We have begun to rectify this by producing reports that synthesize findings across trials so that readers can judge for themselves the degree to which the evidence coheres.1 As we follow the samples in the 3 trials prospectively, we plan to produce additional syntheses of findings that will address these kinds of issues.
By implication, Dr Glauber has pointed out that we have not specified our hypotheses in our reports and have not specified primary and secondary outcomes. It is important to emphasize that our analyses have been directed by our original hypotheses; our publications would be strengthened by making this more explicit.
One final thought: If trials of programs such as this one, which have a range of outcome domains (eg, improving pregnancy outcomes, improving child health and development, improving family economic self-sufficiency), were to be designed by using power calculations that made adjustments for multiple comparisons, trials would be impractical to conduct because of the very large samples and corresponding costs that would be required to achieve statistical power and simultaneously minimize type I and II errors using very low P values. We think this is additional justification for the strategy we have used.
REFERENCES
1. Olds DL. Prenatal and infancy home visiting by nurses: from randomized trials to community replication. Prev Sci. 2002;3 :153 172[CrossRef][Medline]
2. Olds DL, Henderson CR Jr, Chamberlin R, Tatelbaum R. Preventing child abuse and neglect: a randomized trial of nurse home visitation.
Pediatrics. 1986;78
:65
78
3. Olds DL, Henderson CR Jr, Tatelbaum R, Chamberlin R. Improving the life-course development of socially disadvantaged mothers: a randomized trial of nurse home visitation.
Am J Public Health. 1988;78
:1436
1445
4. Olds DL, Henderson CR Jr, Kitzman H. Does prenatal and infancy nurse home visitation have enduring effects on qualities of parental caregiving and child health at 25 to 50 months of life?
Pediatrics. 1994;93
:89
98
5. Kitzman H, Olds DL, Henderson CR Jr, et al. Effect of prenatal and infancy home visitation by nurses on pregnancy outcomes, childhood injuries, and repeated childbearing. A randomized controlled trial.
JAMA. 1997;278
:644
652
6. Kitzman H, Olds DL, Sidora K, et al. Enduring effects of nurse home visitation on maternal life course: a 3-year follow-up of a randomized trial.
JAMA. 2000;283
:1983
1989
7. Olds DL, Eckenrode J, Henderson CR Jr, et al. Long-term effects of home visitation on maternal life course and child abuse and neglect. Fifteen-year follow-up of a randomized trial.
JAMA. 1997;278
:637
643
8. Olds DL, Robinson J, O'Brien R, et al. Home visiting by paraprofessionals and by nurses: a randomized, controlled trial.
Pediatrics. 2002;110
:486
496
9. Olds D, Kitzman H, Cole R, et al. Effects of nurse home visiting on maternal life-course and child development: age-six follow-up of a randomized trial.
Pediatrics. 2004;114
:1550
1559
10. Olds DL, Robinson J, Pettitt L, et al. Effects of home visits by paraprofessionals and by nurses: age-four follow-up of a randomized trial.
Pediatrics. 2004;114
:1560
1568
11. Rothman KJ. No adjustments are needed for multiple comparisons. Epidemiology. 1990;1 :43 46[Medline]
12. Perneger TV. What's wrong with Bonferroni adjustments.
BMJ. 1998;316
:1236
1238
13. Goodman SN. Toward evidence-based medical statistics. 1: the P value fallacy. Ann Intern Med. 1999;130 :995 1004[Medline]
14. Goodman SN. Toward evidence-based medical statistics. 2: the Bayes factor. Ann Intern Med. 1999;130 :1005 1013[Medline]
15. Eckenrode J, Ganzel B, Henderson CR Jr, et al. Preventing child abuse and neglect with a program of nurse home visitation: the limiting effects of domestic violence.
JAMA. 2000;284
:1385
1391
PEDIATRICS (ISSN 1098-4275). ©2005 by the American Academy of Pediatrics
Related articles in Pediatrics:
- Should Officers of the US Department of Health and Human Services Advocate Pet Theories of Sudden Infant Death Syndrome?
- Warren G. Guntheroth and Philip S. Spiers
Pediatrics 2005 115: 1113.[Extract] [Full Text]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





