The neonatal community deserves congratulations for responding vigorously to Silverman's1 call for randomized controlled trials (RCTs) to evaluate neonatal therapies. Although more trials are still needed,2 existing RCTs present new challenges in interpretation. One of the most vexing is when to proclaim innovative therapies as “standard of care.”
The neonatal critical care community faces this challenge in evaluation of hypothermia as treatment for hypoxemic-ischemic encephalopathy (HIE).3–5 National bodies have made declarations that the neonatal community should consider hypothermia experimental pending completion of current ongoing trials.6–9 Although the influence of these bodies is considerable, individual physicians and sites apparently feel pressure to “do something” in the very dire circumstances of HIE in the newborn. In an informal sample of convenience, we have found that some centers are performing cooling, either with or without informed consent. Although many clinicians concur with the leading bodies that state there is a need for additional trials, it is confusing for practicing neonatologists when some members of these bodies also publicly state that they are actively providing cooling therapy.
If leading centers are promoting active cooling, they have, in effect, adopted cooling as a standard of care. This may not only have legal implications but also raises ethical issues for those who believe the right thing to do currently is to continue performing RCTs. The countervailing argument is that to not offer cooling as standard therapy for such a devastating disease as HIE is itself, unethical. These opposing viewpoints are not easily resolvable except by considering what the overall benefit of eliminating residual doubt, one way or the other, would be. Our concern is that advocacy of hypothermia as a standard of care represents an excessively low threshold for accepting promising therapies and will ultimately lead to resources devoted to useless interventions that should be devoted to developing and implementing useful ones.
We argue for a conservative approach to declaring a therapy to be standard of care. Although individual clinicians may choose to implement a therapy for which the magnitude of benefit remains uncertain, designating a therapy as a standard for quality care mandates the presence of very strong evidence. In this commentary, we explain the reasoning behind our approach and propose guidelines for considering when a body of meta-analytical data is strong enough to argue that it is unethical to further randomize into ongoing, or new trials.
Why do we advocate a conservative stance? The history of physicians’ repeated endorsement of therapies that later proved useless or harmful,10 including therapies that seemed promising in RCTs,11 provides one compelling rationale. Ioannidis11 reported, after reviewing highly cited publications of efficacious studies, that 32% were later found to have been contradicted or to have had initially stronger effects. The studies that were highly cited and not refuted had a median sample size of 1542, as opposed to those that were either contradicted or claimed initially stronger effects, which had a median sample size of 624.
In addition to the sobering lessons of subsequent reversal of initially promising results, our arguments for caution regarding hypothermia rest on limitations in the internal validity of the 2 pivotal trials, the Gluckman et al Cool Cap study,3 and the Shankaran et al National Institute of Child Health and Human Development (NICHD) whole-body–cooling study,4 and 1 smaller pilot study of 65 infants by Eicher et al.5 We are aware of 1 other pilot study12 and 1 completed but unpublished study.13 A final study, the Trial of Whole Body Hypothermia for Perinatal Asphyxia (TOBY),14 has completed recruitment but still has to achieve target end points of 18 months’ outcome. We consider the implications of results of these smaller and unpublished studies in “Have the Pooled Studies Achieved an Optimal Information Size?” below.
Researchers in the Cool Cap study found, with a subgroup analysis, a reduction in 18 months’ adverse outcome. Using whole-body cooling, Shankaran et al found an overall reduction in adverse outcomes at 18 months (relative risk [RR]: 0.72 [95% confidence interval (CI): 0.54–0.95]; P = .01). Although similar in basic goals, the 2 studies had important differences6,7 in how they achieved cooling (whether head3 or whole-body4,5 cooling) and in the entry criteria. The Cool Cap trial used amplitude-integrated electroencephalography (aEEG) to discern whether infants were affected enough to randomly assign them.3 These differences do not necessarily preclude a pooled analysis of all enrolled infants irrespective of subgroup, making a total of 478 infants; infants cooled had a 0.76 RR (as compared with control infants) for the outcome of death or disability at 12 to 18 months.15
Although these results seem compelling on the surface, 4 key concerns remain: (1) the potential for biases that arises within an unblinded study; (2) concern about the management of control-group patients; (3) how to interpret subgroup analysis; and (4) the relatively small number of patients studied to date.
Potential for biases arising: Both trials used a composite outcome of death and/or significant (severe in the Cool Cap trial and moderate or severe in the NICHD trial) sequelae in survivors at 18 months. Blinding in the NICU was impossible to undertake for practical and ethical reasons.
Any unblinded trial risks bias in cointervention and the process of establishing outcome events. In this case, the concern is particularly serious and arises from the question, “How do infants with severe HIE die in the NICU?” A frequent mode of death in this setting is a parental or clinician decision to withdraw care. Thus, there is a possibility that whether infants survive is a decision that, to some extent, is in the hands of the unblinded attending physicians.
Defining neurologic criteria for withdrawal of support is difficult, because there is no agreed-on definition for brain death in neonates. Moreover, the process is emotionally traumatic for all concerned. Therefore, one might question the possibility of limiting bias and increasing the transparency of the decision. In fact, in a research context, investigators could build in an independent arms-length review after a withdrawal of care by using review of the medical charts. Investigators could take the crucial step of blinding this adjudication, which could characterize the decision in terms of the certainty of a poor prognosis and the involvement of the clinician in the decision. Blinded adjudicated outcome would go a long way toward resolving concerns about differential application of criteria in intervention and control groups.
The bias we are suggesting might lead to an increase in severe morbidity in survivors, which is a result that did not occur. Nevertheless, failure to observe increases in disability in intervention groups does not exclude the possibility of underlying bias.
Two major ongoing randomized trials, the TOBY14 and Infant Cooling Evaluation (ICE),16 also lack a priori criteria for withdrawal of support or blinded adjudication of withdrawal decisions. Thus, even after those trials are complete, the issue of possible bias in withdrawal of support will remain. Therefore, it will be crucial that these trials report the incidence of death as a result of withdrawal in the intervention and control groups.
Were control patients optimally managed? Hyperthermia after a cerebral insult is associated with worse outcomes.17–19 Although attention has focused on the effect of cooling in the experimental arm, there is a potential that scrupulous attention to ensuring that the infant does not get overheated, rather than cooling, might be the true mechanism of benefit. In the Shankaran et al trial,4 in 41 (39%) of the 106 control infants there was at least 1 esophageal temperature that exceeded 38°C. The network investigators recently presented further analysis of core- and skin-temperature data from their control group and indicated that the range of median core temperatures in their controls was 36.3 to 38.9°C.
Furthermore, an increase in only 1°C in peak core temperature was associated with a fourfold increase in death or disability.19 Again, we were not told the exact frequency of this potential confounder in the Cool Cap study controls, but the report suggested that it may have been as high as 23%.3
Avoiding hyperthermia may be challenging. Current skin temperature–based servoregulation may not be suited to the task of avoiding core hyperthermia. Some active cooling mechanism may be required to avoid excessive temperatures. In addition, the extent to which infants tolerate an upward deflection of temperature is unknown. These issues remain to be ones that the neonatal community needs to actively investigate. One could argue that avoiding hyperthermia is a novel intervention with even less evidence than cooling. On the other had, one might view avoiding hyperthermia as a standard of care, as reflected in Neonatal Resuscitation Program guidelines.20 For those who agree with us that moving to cooling as a standard of care is premature but remain concerned about ethical issues in not cooling, scrupulous attention to avoiding hyperthermia presents a potentially attractive compromise.
Interpretation of subgroup analysis: Methodologists have been aware for more than 15 years of the dangers associated with subgroup analysis.21,22 In the Cool Cap study, the primary outcome of death and/or disability at 18 months did not reach the conventional threshold of statistical significance (55% vs 66%; P = .10). However, a subgroup analysis performed according to severity at presentation showed a reduction of adverse events in the moderately affected group (48% vs 66%; P = .02), but this was not the case in the subgroup of infants who were severely affected at entry (79% vs 68%; P = .51).
Under what circumstances can we be confident in the findings of a subgroup analysis? Of 7 criteria for judging the credibility of a subgroup analysis,23 the Cool Cap trial fails to meet 2 key criteria.
First, the subgroup difference was not consistent across studies. Shankaran et al4 did not use aEEG at entry; nonetheless, a comparison by severity according to clinical assessment enabled an analysis (moderate HIE RR: 0.69 [95% CI: 0.44–1.07], P = .09, and severe HIE RR: 0.85 [95% CI: 0.64–1.13], P = .24). Thus, although the apparent effect was slightly greater in the moderate than in the severe group in the Shankaran et al study, the difference was small and does not substantiate the clear difference in effect claimed by the Cool Cap trial researchers. Although they enrolled exactly the same population and classifying patients according to the aEEG might have led to replication of findings, confidence in subgroup effects requires replication, which is not currently available.
Second, one can conduct a statistical analysis to determine if the difference in subgroups is compatible with the play of chance. From an independent analysis, the US Food and Drug Administration reported that “no conclusions could be drawn from the sponsor's pooled subpopulation because the overall treatment-by-interaction test was not statistically significant.”24 The Cool Cap investigators3 themselves pointed out that the interaction between severity of aEEG changes and treatment outcome shows a P value of .075, which is above the conventional threshold for significance. Thus, the apparent subgroup effect may represent a “siren song” that is best ignored.25,26 Certainly, we cannot consider it established.
Have the pooled studies achieved an optimal information size? Up to now, methodologists and systematic reviewers have given limited thought to the issue of a threshold for when enough data have accumulated to conclude that a question has been answered adequately. A number of authors have highlighted the dangers of overestimating treatment effects in individual randomized trials that are stopped early, after an interim analysis.27 To guard against this, it is common to see trial organizers using formal stopping boundaries such as the so-called Lan-DeMets α spending function rule.28 Formal stopping rules represent one response to awareness that repeated looks at data from RCTs violate the fundamental assumptions that underlie conventional statistical analysis, which invalidates the conventional rule of significance (P < .05) and makes the likelihood of a false-positive finding and overestimation of treatment effects extremely high.29
Systematic reviews and meta-analysis run the risk of a very similar phenomenon. Nowadays, thousands of randomized trials are conducted each year. Inevitably, some RCTs, particularly if their sample size is relatively small and they accrue relatively few events, will demonstrate spurious overestimates of effect. In a smaller but still-appreciable number, the first several small trials will produce spurious overestimates of benefit. Thus, meta-analyses represent a parallel situation of accumulating data in which early apparent benefits that come from relatively small numbers of patients may represent misleading chance phenomena.
How can one guard against these false-positive conclusions? Pogue and Yusuf30 first proposed a meta-analytic approach analogous to stopping rules for individual trials. Building on this early work, subsequent investigations have suggested a calculation of an optimal information size to estimate the extent of this risk of overestimates of treatment effect that arise from small data sets.31 These approaches remain underutilized, particularly in the neonatal community, which has failed to give adequate attention to this issue.
To perform a calculation of the “optimal information size,” one needs to know the control event rate. Pooling the control rate events of the Cool Cap, NICHD, and Eicher et al trials estimates a rate of 61.3%. Because treatment effects are rarely higher than 25% in medicine, one can assume a plausible RR reduction of 20% for death and disability with cooling. Assuming such a plausible 20% RR reduction, an α error of .05, a β error of 10%, and a control event rate of 61.3%, the optimal information size in this case would include studies of a total of 692 patients. Using a sensitivity-analysis approach, if the event rate was lower, say at 50%, the optimal information size would be 1102. Both of these estimates are greater than the 442 patients included in the 2 fully published highest standard relevant RCTs being considered. If the Eicher et al trial5 is included, the total recruited comes to 507. Currently unpublished trials include the TOBY, which enrolled 325 infants who are awaiting outcome at follow-up,14 and the Shao et al13 trial of 178, which had an unbalanced randomization with 111 cooled infants versus 67 control infants. Finally, 1 small pilot randomized trial12 enrolled 22 infants. Even ignoring the concerning issues of potential bias we have highlighted, additional reports and studies are required to provide a robust assessment of the effect of cooling. This is one reason to welcome the timely completion of the ICE trial with a sample size of 276 infants.16 In addition, the ICE will provide information on the feasibility and safety of a pragmatic approach to whole-body cooling in transport.
In summary, exciting potential exists in hypothermia for cooling. Is the evidence sufficiently strong that clinicians impressed with the results may cautiously use this treatment for neonatal encephalopathy while they wait for the many questions around its optimal use to be answered? Certainly. On the other hand, the neonatal community continuing with a conservative approach to declaring a new standard of care will avoid unfortunate mistakes of premature dissemination of experimental management strategies. In both adults and children with traumatic brain injury, cooling has not fulfilled its earlier promise. We should demand strong evidence of robust, consistent effects in highly valid studies that have enrolled adequate numbers of patients before mandating a new therapy for management of all relevant patients. The evidence for cooling fails to meet this standard.
Dr Barks, as a site investigator for Mott Hospital, recruited infants for the Cool Cap study; Dr Kirpalani was a site investigator for McMaster University Hospital and recruited infants for the ICE study.
We acknowledge valuable methodologic advice from Dr Barbara Schmidt and Dr Edmund Hey.
- Accepted May 18, 2007.
- Address correspondence to Haresh Kirpalani, BM, MSc, Division of Neonatology, Children's Hospital of Philadelphia, 34th Street and Civic Center Boulevard, Philadelphia, PA 19104. E-mail:
The authors have indicated they have no financial relationships relevant to this article to disclose.
This work was presented in an earlier version to the Canadian Paediatric Society;'s 83rd Annual Meeting; June 13, 2006; St Johns, Newfoundland.
Opinions expressed in these commentaries are those of the authors and not necessarily those of the American Academy of Pediatrics or its Committees.
- ↵Silverman WA. Personal reflections on lessons learned from randomized trials involving newborn infants from 1951 to 1967. Clin Trials.2004;1 :179– 184
- ↵Blackmon LR, Stark AR; American Academy of Pediatrics, Committee on Fetus and Newborn. Hypothermia: a neuroprotective therapy for neonatal hypoxic-ischemic encephalopathy. Pediatrics.2006;117 :942– 948
- American Heart Association. 2005 American Heart Association (AHA) guidelines for cardiopulmonary resuscitation (CPR) and emergency cardiovascular care (ECC) of pediatric and neonatal patients: pediatric basic life support. Pediatrics.2006;117(5) . Available at: www.pediatrics.org/cgi/content/full/117/5/e989
- ↵International Liaison Committee on Resuscitation. The International Liaison Committee on Resuscitation (ILCOR) consensus on science with treatment recommendations for pediatric and neonatal patients: pediatric basic and advanced life support. Pediatrics.2006;117(5) . Available at: www.pediatrics.org/cgi/content/full/117/5/e955
- ↵Lacchetti C, Guyatt G. Surprising results of randomized, controlled trials. In: Guyatt G, Rennie D, eds. The Users’ Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. Chicago, IL: AMA Publications; 2002
- ↵Gunn AJ, Gluckman PD, Gunn TR. Selective head cooling in newborn infants after perinatal asphyxia: a safety study. Pediatrics.1998;102 :885– 892
- ↵Shao X, Zhou W, Cheng G, et al. Head cooling in neonatal hypoxic ischemic encephalopathy-multicenter randomized trial from China. Presented at: Hot Topics in Neonatology; December 3–5, 2005; Washington, DC
- ↵National Perinatal Epidemiology Unit. Whole body hypothermia for the treatment of perinatal asphyxial encephalopathy. Available at: www.npeu.ox.ac.uk/Toby. Accessed September 10, 2007
- ↵Edwards AD, Azzopardi DV. Therapeutic hypothermia following perinatal asphyxia. Arch Dis Child Fetal Neonatal Ed.2006;91 :F127– F131
- ↵Jacobs SE, Stewart M, Inder TE, Doyle L, Morley C. Progress of the pragmatic Australian “ICE” (Infant Cooling Evaluation) randomised controlled trial of whole body cooling for term newborns with hypoxic-ischemic encephalopathy. Presented at: Hot Topics in Neonatology; December 3–5, 2006; Washington, DC
- ↵Laptook AR. Adverse outcome increases with elevated temperature for infants provided usual care following hypoxic-ischemic encephalopathy (HIE). Pediatr Res.2006;59 :5755– 5762
- ↵Heart and Stroke Foundation of Canada. Guidelines 2000 pediatrics. Available at: http://188.8.131.52/ClientImages/1/Guidelines_PALS_NRP_2000.pdf. Accessed September 10, 2007
- ↵Wyer P, Ioannidis J, Guyatt G. When to believe a sub-group analysis. In: Guyatt G, Rennie D, eds. The Users’ Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. Chicago, IL: AMA Publications; 2007
- ↵Chu G. Summary minutes of the meeting of the neurological devices advisory panel, June 17, 2005. Available at: www.fda.gov/ohrms/dockets/ac/05/minutes/20054162m1_summary%20minutes.pdf)20. Accessed September 10, 2007
- ↵Devereaux PJ, Beattie WS, Choi PT, et al. How strong is the evidence for the use of perioperative beta blockers in non-cardiac surgery? Systematic review and meta-analysis of randomised controlled trials. BMJ.2005;331 :313– 321
- Copyright © 2007 by the American Academy of Pediatrics