Resuscitation in the “Gray Zone” of Viability: Determining Physician Preferences and Predicting Infant Outcomes
OBJECTIVE. We assessed physician preferences and physician prognostic abilities regarding delivery room management of exceedingly low birth weight/short gestation infants.
METHODS. We surveyed US neonatologists to assess their behavior in the delivery room when confronted with infants with gestational ages of 22 to 26 weeks. We identified 102 infants in our NICU with birth weights/gestational ages of 400 g/23 weeks to 750 g/26 weeks, whose follow-up care was ensured because of their participation in ongoing clinical trials. We determined 4 proxy measures for “how the infant looked” in the delivery room (Apgar scores at 1 and 5 minutes and heart rates at 1 and 5 minutes) and assessed the predictive value of each marker for subsequent death or neurologic morbidity.
RESULTS. For infants with birth weights of <500 g and gestational ages of 23 weeks, only 4% of 666 responding neonatologists would provide full resuscitation. In contrast, for infants with birth weights of >600 g and gestational ages of 25 weeks, >90% of neonatologists considered resuscitation obligatory. For infants with birth weights of 500 to 600 g and gestational ages of 23 to 24 weeks, only one third of neonatologists responded that parental preference would determine whether they resuscitated the infant in the delivery room. The majority wanted “to see what the infant looked like.” For 102 infants with birth weights of ≤750 g, Apgar scores at 1 and 5 minutes and heart rates at 1 and 5 minutes were neither sensitive nor predictive for death before discharge, survival with a neurologic abnormality, or intact neurologic survival.
CONCLUSIONS. The “gray zone” for delivery room resuscitation seems to be between 500 and 600 g and 23 and 24 weeks. For infants born in that zone, neonatologists' reliance on accurate prediction of death or morbidity in the delivery room may be misplaced.
Patient autonomy is considered central to modern medical bioethics. It is widely accepted that competent adult patients have the right to refuse offered medical interventions. In the absence of a competent patient, surrogates are sought to make decisions regarding invasive medical procedures. In the case of newborns, the natural surrogates are parents. At times, however, the natural rights of parents to make medical decisions for their children may be opposed by third parties, usually physicians, ostensibly acting in the best interests of the child. These physicians may seek to impose what they perceive as appropriate medical treatment, with or without seeking the consent (informed or otherwise) of the parents.
A generation ago, this paternalistic behavior was normative for physician/patient decision-making.1–3 In the intervening 25 years, autonomy has come to dominate, with the singular exception of medical interventions deemed futile.4–8 When physicians invoke futility, they may and often do successfully override claims of autonomous decision-makers. Where futility and autonomy meet often is not a bright line but is better characterized as a “gray zone.” Within this zone, physician/patient or physician/surrogate discussion, negotiation, and compromise is thought to be the appropriate model for resolution of potential conflict.9
We began this study with the knowledge of a few epidemiologic truths about resuscitation of newborn infants with exceedingly low birth weights (BWs). Survival for appropriate-for-gestational age infants with BWs of <500 g and gestational ages (GAs) of <23 weeks is extremely unlikely (rates much less than 10%), and intact survival is even rarer.10–14 In contrast, survival rates for infants with BWs of >750 g and GAs of >26 weeks are relatively good (nationwide rates of ≥70% and higher rates in many tertiary NICUs).10–14 At intermediate BWs, survival is uncertain and morbidity for survivors is considerable.
Applying the axioms of modern bioethics to these epidemiologic truths, we formulated a 3-part hypothesis regarding delivery room care of infants with exceedingly low BWs and GAs. First, we predicted that, below some empirically determinable “threshold of viability,” the vast majority of physicians would not offer care they considered futile; that is, they would not resuscitate such infants regardless of the wishes of the parents. Conversely, we predicted that, above some BW/GA threshold of viability, physicians would not honor parental requests to withhold resuscitation; that is, they would resuscitate such infants, acting in the best interests of the infant and overriding, if necessary, the wishes of the parents. The behavior between these extremes interested us most. We hypothesized that, between these BW/GA boundaries, a gray zone of uncertainty regarding delivery room resuscitation would exist, characterized by physician deference to the wishes of the parents.
To address these hypotheses, we surveyed US neonatologists regarding their behavior in the delivery room when confronted with infants with exceedingly low BWs. We described 4 BW/GA scenarios ranging from >750 g/26 weeks to <500 g/23 weeks. For each, we asked our respondents whether they would provide full resuscitation, provide comfort care, or defer to the parents' wishes in their delivery room management. We also allowed space for additional comments.
In reviewing the responses to our survey, we noted that frequently our respondents attempted to supplement their responses with comments such as “I would withhold my resuscitation decision until I saw how the infant looked” or “I would resuscitate initially but I might reconsider after I saw how the infant responded.” We were intrigued by this notion, namely, that “how the infant looked” or “how the infant responded” in the first minutes after birth could or should influence the appropriateness of providing or foregoing continued NICU intervention. Many of our respondents claimed that they used such observations to help them decide what to do in the delivery room, and we wondered whether they should.
We reformulated these intuitions into testable hypotheses. We determined 4 proxy measures for how the infant looked in the delivery room and then assessed the predictive value of each marker for the subsequent demise or survival of the infant and for the possibility of permanent neurologic morbidity. We assessed Apgar scores at 1 minute, as a reflection of how the infant looked at birth, and Apgar scores at 5 minutes, as a reflection of how the infant looked after initial delivery room resuscitation. In addition, we were sensitive to the possibility that parts of the Apgar score either are subjective15 or may reflect exogenous interventions (eg, respiration rate and color for children receiving bag resuscitation with 100% oxygen and grimace and tone for children born of mothers who had received general anesthesia). Therefore, we analyzed the heart rate (HR) portion of the Apgar score alone, in an attempt to record the most endogenous aspect of each neonate's physiologic condition and response to resuscitation. We report our findings here and discuss their implications for ethical analyses of delivery room management of extremely immature newborns at the edge of viability.
Study 1: Survey of Delivery Room Management
In 1996 and again in 2003 (for replication), we surveyed ∼500 neonatologists in the United States regarding their delivery room resuscitation decisions. Respondents were selected from a mailing list provided by the American Academy of Pediatrics, Section on Perinatal Pediatrics, by using a random-number generator. In each survey, we asked physicians what they would do if confronted in the delivery room with a newborn infant in 4 potential scenarios, as follows: (1) BW of <500 g and GA of <23 weeks, (2) BW of 500 to 600 g and GA of ∼24 weeks, (3) BW of 601 to 750 g and GA of ∼25 weeks, and (4) BW of >750 g and GA of ≥26 weeks. For each scenario, we asked whether the physician would choose (1) full resuscitation, (2) comfort care, or (3) deferral to the parents' wishes. We also asked the physicians to rate which, if any, of the following factors affected their decision: (1) viability, (2) futility, (3) quality of life, (4) resource allocation, (5) fear of litigation, or (6) religious beliefs. Finally, we afforded respondents the opportunity to elaborate on any aspect of their responses.
Study 2: Determination of Predictive Power of Apgar Scores for Death and Morbidity
Taking advantage of a cohort of 102 extremely premature infants (BW between 400 and 750 g) who were enrolled previously in 1 of 2 ongoing clinical trials in our NICU and for whom excellent clinical follow-up monitoring was thus ensured,13,14 we correlated Apgar scores at 1 and 5 minutes and the HR portion of the Apgar score with NICU death and, for the survivors, with Mental Developmental Index and Psychomotor Developmental Index scores on the Bayley Scales of Infant Development.16 For the purposes of our analysis, infants were classified as “burdened” if they either died or had a Mental Developmental Index or Psychomotor Developmental Index score of <70 at corrected age of 2 years.
Comparisons among neonatologists' responses were performed by using the χ2 test. Comparisons of algorithmic predictions between survivors and nonsurvivors or between burdened and unburdened infants were performed by using Student's t test or analysis of variance. The Bonferroni correction was applied for multiple pairwise analyses where appropriate. In addition, multivariate logistic regression analysis was applied to distinguish covariates among the following possible predictors of either death or morbidity: BW, GA, race, gender, Apgar score at 1 minute, Apgar score at 5 minutes, HR at 1 minute, and HR at 5 minutes. Both studies were performed with the approval of the institutional review board of the University of Chicago.
Study 1: Neonatologists' Responses to Delivery Room Scenarios
We received 304 responses from 500 surveys mailed to neonatologists in 2003 (63% response rate) and 362 responses from 550 surveys in 1996 (66% response rate). The survey respondents represented an experienced group of neonatologists with balanced national distribution in community and academic practices. They had an average of 18.2 years of experience (range: 1–55 years; median: 18 years; mode: 18 years). A slight majority of respondents practiced in a community setting (54%), whereas others practiced in a university (36%) or mixed community/university (4%) environment. Responses were obtained from neonatologists practicing in 43 states and Puerto Rico. Interestingly, there were no significant differences in responses for each BW/GA scenario among responses received in 1996 vs 2003. Therefore, responses from the 2 surveys were combined in subsequent analyses.
Figure 1 presents the distribution of delivery room responses as a function of BW/GA scenario for 666 neonatologists. For the smallest infants (BW of <500 g and GA of 23 weeks), 4% of respondents would provide full resuscitation, 57% would provide comfort care only, and 36% would defer to the parents' wishes. For infants of >750 g/26 weeks (at the other end of the surveyed continuum), 99% of respondents would provide full resuscitation. For infants with BWs of 600 to 750 g and GAs of 25 weeks, 91% of respondents would provide full resuscitation, whereas 8% would defer to the parents' wishes and 1% would provide comfort care. For infants with BWs of 500 to 600 g and GAs of 24 weeks, 59% of respondents would opt for full resuscitation, 2% would choose comfort care, and 37% would defer to the parents' wishes.
Forty-five percent of respondents chose to add additional comments about their delivery room decisions. The vast majority of these comments reflected the sentiment that “I would see how the infant looked” or “I would see how the infant responded” before deciding to offer, to continue, or to withhold resuscitative efforts.
Figure 2 presents the rationales chosen for delivery room resuscitation decisions, as a function of BW group. For all BW groups and for all decisions, patient-oriented considerations (viability, futility, and quality of life) were deemed very important. In contrast, for all BW groups and for all decisions, considerations of non–patient-related issues (resource allocation, fear of litigation, and the physician's own religious beliefs) were viewed consistently as less relevant to the delivery room decision.
Study 2: Predictive Power of Apgar Scores and HRs for Death and Morbidity
Figure 3 presents the distribution of Apgar scores at 1 and 5 minutes as a function of survival for 102 infants admitted to our NICU (BW: 640 ± 86 g; GA: 24.9 ± 1.5 weeks). Apgar scores at 1 minute for 65 survivors did not differ significantly from Apgar scores at 1 minute for 37 nonsurvivors (4.1 ± 2.0 vs 3.5 ± 1.9; P = .14). Moreover, as Fig 3 reveals, there was substantial overlap of Apgar scores between survivors and nonsurvivors. At every Apgar score, there were at least as many survivors as nonsurvivors. Consequently, we could identify no 1-minute Apgar score cutoff value below which resuscitation seemed futile or above which survival was ensured. For Apgar scores at 5 minutes, there was no statistically significant difference between survivors and nonsurvivors (6.3 ± 2.0 vs 6.2 ± 1.7; not significant). Again, the distributions of survivors and nonsurvivors overlapped considerably.
Figure 4 presents the distribution of Apgar scores at 1 and 5 minutes as a function of morbidity. At 1 minute, Apgar scores for 70 burdened infants were slightly but significantly lower than scores for 32 unburdened infants (3.6 ± 1.9 vs 4.5 ± 2.1; P = .03). However, this difference disappeared by 5 minutes (6.2 ± 1.8 vs 6.5 ± 2.0; not significant). It is apparent from Fig 4 that the overlap between burdened and unburdened infants was substantial and there was no Apgar score cutoff value below which a burdened outcome was ensured or above which an unburdened outcome was likely.
As noted for Apgar scores, HR values at 1 and 5 minutes provided little predictive power for either mortal or morbid outcomes for these infants. At each HR, we found substantial overlaps between nonsurviving and surviving infants and between burdened and unburdened newborns. Figures 5 and 6 present receiver operating characteristic curves for each of the 4 algorithmic attempts (Apgar scores and HR values at 1 and 5 minutes) to balance predictive value and sensitivity, quantifying the impact of how the infant looked as a predictor of death (Fig 5) or burdensome outcome (Fig 6) after initial resuscitation in the delivery room. For neither Apgar scores nor HRs at 1 or 5 minutes was the area under the receiver operating characteristic curve >0.64, which indicates the dismal sensitivity and specificity of these tests for either survival or burdensome outcome.
Table 1 presents values derived from multivariate logistic regression analyses of the effects of GA, BW, gender, race, Apgar scores at 1 and 5 minutes, and HRs at 1 and 5 minutes on both death and morbidity for the 102 infants in our study. Within the restricted GA/BW range of our study population, no variable associated with how the infant looked in the delivery room predicted either increased or decreased likelihood of overall survival. Moreover, whereas 1-minute Apgar scores of ≥7 were associated significantly with less morbidity, this association accounted for only 14 (14%) of all 103 cases, and the effect disappeared by 5 minutes of age.
The concept of a limit of viability, that is, a numerical answer to the question of how small is too small, has been with neonatology since its inception. Sixty years ago, infants weighing <1 kg were classified as stillborn.3 After Apgar scores were introduced in the 1950s, these infants were recognized to be live-born but “previable” and were allowed to die without technologic intervention. This 1-kg limit persisted until the widespread adoption of mechanical ventilation in the late 1960s. In subsequent decades, the limit of viability has been rolled back inexorably. By the 1980s, reports of survivors with GAs of 24 weeks and BWs of 500 g were not rare. Follow-up studies were published, and the traditional practice of classifying infants with BWs of <500 g as stillborn was abandoned.3,9
We report here our survey of several hundred neonatologists across the United States regarding their delivery room decisions for infants at the edge of viability, in 1996 and in 2003. Several important conclusions are apparent. First, and most reassuring from the standpoint of traditional medical bioethics, the process of decision-making seems well motivated. When contemplating delivery room resuscitation of extremely premature infants, neonatologists place great emphasis on patient-oriented outcome variables (futility, viability, and morbidity), deemphasizing societal or personal concerns (resources, religion, and lawsuits).
Second, the lower edge of viability, as determined functionally by the self-reported behavior of neonatologists in the delivery room, is ∼500 g/∼23 weeks and has remained there for at least the past several years. The large majority of neonatologists in 1996 and in 2003 would not resuscitate infants below that BW/GA limit, regardless of the expressed wishes of the parents. In contrast, for infants with BWs of >600 g and GAs of 25 weeks, obligatory resuscitation, justified by the best interests of the infant, would be provided by >90% of neonatologists. The gray zone for delivery room resuscitation seems to be between 500 and 600 g (∼23–24 weeks of gestation). There, approximately one third of neonatologists responded that parental preference would influence their decision regarding whether to resuscitate the infant in the delivery room.
Some may view this observation as troubling; only a minority of neonatologists responded that parental preference was decisive for infants in this category. In this view, patient autonomy, which is considered central to modern bioethical decision-making, is inadequately respected. More than one half of the infants in this gray zone will die, all of the survivors will have extended NICU courses marked by expensive, invasive, technologic interventions, and almost one half of the survivors will have permanent morbid handicaps. For most other patients (adults or older children) faced with comparable prognoses in other ICUs, their preferences, or those of their surrogates, would be the determining factors in decisions about continuing or withholding intensive intervention. That is apparently not the case in the NICU.
Alternatively, one can view these data as reflecting the considered opinions of neonatologists who are well intentioned, if conflicted, in their attempts to balance the best interests of the infant with the possible concerns of the family. Generally, neonatologists refuse, on futility grounds, to initiate resuscitation for infants who as a group have minimal chance of survival (<500 g/23 weeks). Conversely, neonatologists insist, on viability grounds, on resuscitating infants who as a group have substantial likelihood of survival (>600 g/24 weeks). The middle zone is difficult.
Respondents for all BW categories, but particularly for infants in the gray zone of uncertainty, frequently noted that their behavior might be modified for individual cases depending on how the infant looked after birth, as a predictive marker of the future success of NICU care. These comments suggest that many doctors think that the wisest course for some infants is to allow them to “declare” themselves. Do they?
We reformulated our respondents' comments into testable hypotheses by determining the predictive value of 4 possible proxy measures for how the infant looked in the delivery room (Apgar scores at 1 and 5 minutes and HR values at 1 and 5 minutes) for the outcomes of either death or permanent neurologic morbidity. We found that each of these proxy measures seemed statistically inadequate as a basis for a systematic response regarding the initiation or withholding of resuscitative efforts. None of these proxy measures was successful in distinguishing infants who were going to die before discharge, survive with a burdensome outcome, or survive unburdened.
Three methodologic caveats must be addressed explicitly. The proxy measures for how the infant looked (Apgar scores and HRs) were all derived from 1 NICU (ours). How typical is our NICU? The distribution of Apgar scores reported here for infants in this BW/GA range is comparable to those in previous reports from other institutions. In addition, we reported previously that our BW-specific mortality rates and rates of complications of NICU care (intraventricular hemorrhage, periventricular leukomalacia, and retinopathy of prematurity) are comparable to published outcomes from many centers.13,14 A review of the medical literature reveals a remarkable consistency among most tertiary NICUs for infants in this GA range; survival rates below 500 g/23 weeks are consistently minimal, whereas survival rates are consistently good for infants born larger than 750 g/25 weeks.
A second caveat is more subtle. Some may argue that these data are incomplete, that is, that neonatal resuscitation is a continuum and that decisions made in the delivery room can and should be revisited during the infant's NICU stay. In this view, NICU intervention functions as a time-limited trial and infants declare themselves by their physiologic responses over time (if not in the first few minutes, then in the first few hours or days). However intuitive this view may be, there are few data to support it. We and others have demonstrated that serial assessments of physiologic stability (with the Score for Neonatal Acute Physiology, Score for Neonatal Acute Physiology II, and Score for Neonatal Acute Physiology II Perinatal Extension) lose, rather than gain, predictive power as a function of time.17–19 Nonsurviving infants seem to cloak themselves, not declare themselves, over days in the NICU.
Finally, we used Bayley Scale scores at 2 years of age as markers of neurologic morbidity for our surviving population. These markers have been widely used in the neonatal literature10–14 and represent our best balance between the importance of obtaining neurologic follow-up data and the practical difficulties of maintaining continued contact with a large proportion of our original cohort over several years. However, recent data have suggested that the predictive value of Bayley Scale scores at 2 years for performance at school age is imperfect.20 We note that these observations serve to underscore our major conclusion, namely, that how the infant looks in the delivery room does not readily discriminate infants in this BW/GA range into survivors, nonsurvivors, and impaired survivors at 2 years (or 8 years) of age.
The gray zone for delivery room resuscitation in the United States seems to be between 500 and 600 g and between 23 and 24 weeks of gestation. Neonatologists report that they often provide or withhold resuscitation for infants in this gray zone on the basis of how the infant looks. Our data suggest that neonatologists' confidence in their ability to predict subsequent death or morbidity for these infants on the basis of their appearance in the delivery room may be unfounded.
- Accepted April 10, 2007.
- Address correspondence to William Meadow, MD, PhD, Department of Pediatrics, MC6060, University of Chicago, 5815 South Maryland Ave, Chicago, IL 60637. E-mail:
The authors have indicated they have no financial relationships relevant to this article to disclose.
- ↵Todres ID, Krane D, Howell MC, Shannon DC. Pediatricians' attitudes affecting decision making in defective newborns. Pediatrics.1977;60 :197– 201
- ↵Silverman WA. Overtreatment of neonates? A personal retrospective. Pediatrics.1992;90 :971– 976
- Kopelman LM. Conceptual and moral disputes about futile and useful treatments. J Med Philos.1995;20 :109– 121
- Rhoden NK. Treating Baby Doe: the ethics of uncertainty. Hastings Cent Rep.1989;16 :34– 42
- ↵Lantos JD, Meadow WL. Neonatal Bioethics: The Moral Challenges of Medical Innovation. Baltimore, MD: Johns Hopkins University Press; 2006
- ↵Vohr BR, Wright LL, Dusick AM, et al. Neurodevelopmental and functional outcomes of extremely low birth weight infants in the National Institute of Child Health and Human Development Research Network 1993–1994. Pediatrics.2000;105 :1216– 1226
- Hack M, Fanaroff AA. Outcomes of children of extremely low-birth weight premature infants in the 1990s. Early Hum Dev.1999;53 :195– 218
- ↵Bayley N. Manual for the Bayley Scales of Infant Development. 2nd ed. San Antonio, TX: Psychological Corp; 1993
- ↵Meadow WL, Frain L, Ren Y, Soneji S, Lantos JD. Intimations of mortality in the NICU: certainty, uncertainty, and informed consent? Pediatrics.2002;109 :878– 886
- Richardson DK, Gray JE, McCormick MC, Workman K, Goldmann DA. Score for Neonatal Acute Physiology: a physiologic severity index for neonatal intensive care. Pediatrics.1993;91 :617– 623
- ↵Hack M, Taylor HG, Drotar D, et al. Poor predictive validity of the Bayley Scales of Infant Development for cognitive function of extremely low birth weight children at school age. Pediatrics.2005;116 :333– 341
- Copyright © 2007 by the American Academy of Pediatrics