Methods of Mortality Risk Adjustment in the NICU: A 20-Year Review
- Stephen W. Patrick, MD, MPH, MSca,b,c,
- Robert E. Schumacher, MDa,c, and
- Matthew M. Davis, MD, MAPPa,b,d,e,f
- aDepartment of Pediatrics and Communicable Diseases,
- bRobert Wood Johnson Foundation Clinical Scholars Program,
- cDivision of Neonatal-Perinatal Medicine, and
- dChild Health Evaluation and Research (CHEAR) Unit, Department of Pediatrics and Communicable Diseases, University of Michigan Health System, Ann Arbor, Michigan; and
- eGerald R. Ford School of Public Policy, and
- fInternal Medicine, University of Michigan, Ann Arbor, Michigan
BACKGROUND AND OBJECTIVES: Improving the quality of care delivered in NICUs relies on the ability to partition variation associated with patient characteristics from those attributed to processes of care delivery through risk adjustment. Multiple methods of mortality risk adjustment have been proposed for NICU populations. We review existing literature pertaining to mortality risk adjustment in the NICU.
METHODS: PubMed and Scopus were searched to identify unique methods of mortality risk adjustment in the NICU and their trends in citation since original publication. Additional online searches were performed to identify organizational and government agency means of mortality risk adjustment for patients in the NICU.
RESULTS: Among 10 unique neonatal mortality risk adjustment scores identified by this review, there are >3 dozen different measurement components. No score includes >28 components; no score contains <6. Scores differ substantively in their intended purposes, component parameters and intensity of data collection. The Clinical Risk Index for Babies (CRIB) has been referenced most frequently in other research articles (447 citations), while the National Institutes of Child Health and Human Development “calculator” has the greatest rate of citations per year since initial publication (37). The scores are notably inconsistent in their approaches to timing of data collection and inclusion of comorbidity indicators.
CONCLUSIONS: Rigorous means of risk adjustment in the NICU are essential to enhancing the quality of care delivered to neonates, by facilitating more meaningful comparisons in quality improvement. Building on the first 20 years of neonatal mortality risk adjustment will ultimately allow researchers and quality improvement teams to apply measures that facilitate cross-institutional comparisons thoroughly and fairly.
- Risk adjustment
- neonatal mortality
- AHRQ —
- Agency for Healthcare Research and Quality
- CRIB —
- Clinical Risk Index for Babies
- NICHD —
- National Institute of Child Health and Human Development
- NQI —
- neonatal quality indicator
- PE —
- Perinatal Extension
- SNAP —
- Score for Neonatal Acute Physiology
- VON-RA —
- Vermont Oxford Network Risk adjustment
Variations in quality of processes and outcomes, including mortality, have been demonstrated in NICUs.1–4 The presence of variation suggests opportunity to improve delivery of care to neonates and to provide institutions and health care providers with information about their performance relative to peers.
Neonates vary in their gestational age, birth weight for age, and clinical comorbidities, and therefore using raw outcomes to evaluate quality process measures and outcomes (including complications and mortality) may not yield meaningful comparisons across institutions. Advancing the quality of care delivered in NICUs must first begin with accurate and rigorous comparisons of both patients and institutions through risk adjustment, which accounts for patient-associated factors in making institutional comparisons.5
Risk adjustment enables the differentiation of intrinsic heterogeneity among patients (eg, comorbid conditions) and institutions (eg, available hospital personnel and resources). With risk adjustment, an outcome can be better ascribed to variations in practice and in institutional circumstances. Initial attempts at risk adjustment relied almost exclusively on birth weight; however, increased mortality is associated with other conditions, such as birth defects. Therefore, over the past 20 years, several mortality scores have been developed for the NICU setting and used to evaluate interinstitutional variation in mortality. Dorling et al reviewed scores published through 20046; however, additional approaches to neonatal mortality risk stratification have been published since then and used widely. In this review, our aim was to review the existing literature to (1) expand on previous summaries of existing means of mortality risk adjustment in the NICU and (2) identify opportunities for future innovations in risk adjustment in the NICU.
Data Sources and Article Selection
We performed a literature review by using PubMed to identify unique mechanisms of mortality risk adjustment in the NICU. Search terms included “neonatal mortality” and “risk adjustment,” and MeSH search terms included “severity of illness index” and “infant mortality.” We then sought to identify additional methods of risk adjustment used by government and hospital accrediting agencies using online search tools (Google) and by reviewing government and clinical organization Web sites.
Articles were retained for inclusion in the review if they contained a novel approach to risk adjustment in the NICU, distinct from previously published work, were validated in neonatal populations in the United States or United Kingdom with published coefficients enabling future use by researchers. As noted in Dorling et al,6 the vast majority of such scores are drawn from the United States and United Kingdom, and therefore we wished to characterize iterative changes in scores related to improvements in quality and scaling science over time in these countries. Citations provided in each article were reviewed to search for additional scores for risk adjustment that might have been missed by our original search. Overall, we identified 10 unique neonatal mortality risk scores.
Subsequently, we used Scopus to evaluate the relative frequency of utilization of each method in research studies subsequent to publication of the original article.
Data elements of each method of risk adjustment were extracted by the authors to create a detailed comparison (Table 1). Elements of comparison included (1) birth characteristics (eg, birth weight, gestational age, mode and site of delivery), (2) clinical physiologic characteristics (eg, blood pressure, heart rate), (3) source of data (clinical and/or administrative), (4) administrative data obtained (eg, major diagnostic classification), (5) entry criteria (eg, all NICU admissions, birth weight <1500 g), and (6) timing of data collection (eg, before birth, 24 hours after admission).
Methods of Mortality Risk Adjustment in the NICU
Among the 10 neonatal mortality risk adjustment scores in the United States and United Kingdom identified for this review, there are >3 dozen distinct components of the scores (Table 1). No score includes >28 components; no score contains <6. Most scores were explicitly designed to be better than birth weight alone as a risk adjustment approach. We highlight the ways in which the scores built on and contrasted with each other, through a chronologic ordering of the scores by the years of birth for the neonates upon which the scores were based and the publication date.
National Institute of Child Health and Human Development 1993
In 1993, Horbar et al7 developed a means of risk adjusting neonatal mortality from data obtained among institutions participating in the National Institute of Child Health and Human Development (NICHD) Research Network. Data were obtained from newborns weighing between 501 and 1500 g among the 7 participating centers between 1987 and 1989. After assessing multiple candidate predictors of mortality, the final model included birth weight, small for gestational age, race, gender, and 1-minute Apgar score. The authors limited data obtained to the time of admission, intending for NICU clinical management not to influence the model. This model proved to be more predictive of neonatal mortality than birth weight alone.
Score for Neonatal Acute Physiology and Score for Neonatal Acute Physiology Perinatal Extension
The Score for Neonatal Acute Physiology (SNAP) was first published in 1993 by Richardson et. al.8 Design of the SNAP was modeled after the Acute Physiology and Chronic Health Evaluation9 in critically ill adults and the Physiology Stability Index.10 The components of SNAP take the least favorable of several physiologic measurements that occur within the first 24 hours after admission (eg, blood gas pH, mean arterial pressure). The score was developed from a cohort of newborns born in 1989 and 1990 in 3 Boston NICUs. SNAP includes >2 dozen data components and requires 5 to 15 minutes to complete for each patient (depending on patient complexity).8 The score was found to predict in-hospital death for neonates with strong accuracy. The Score for Neonatal Acute Physiology Perinatal Extension (SNAP-PE) includes the physiologic variables of SNAP and adds birth weight, 5-minute Apgar score, and small for gestational age (<5th percentile).11 SNAP and SNAP-PE provided clear advantages and superiority over birth weight alone. Their major drawback was the intensive nature of the data collection required.
Clinical Risk Index for Babies
The Clinical Risk Index for Babies (CRIB) was published in 1993 by the International Neonatal Network. The score was developed from a cohort of infants born in 4 UK NICUs from 1988 to 1990. The authors sought to create a simple score for routine clinical use, based exclusively on data collected within the first 12 hours after birth. The score includes 6 variables, including a combination of birth and clinical characteristics. Importantly, CRIB includes nonlethal birth defects partitioned into large categories of (1) not present, (2) not acutely life-threatening, and (3) acutely life-threatening.12 By using birth defects, the CRIB helps clarify the intrinsic risk present in a population in a way that systems using only physiologic measures does not; in other words, this was an effort to include a comorbidity adjustment within the score. This maneuver helps clarify differences in outcomes obtained through the score (eg, an infant with congenital heart disease with a certain oxygen requirement or blood gas measure values has a risk different than patients without heart disease). CRIB provides some possible advantages to other early scores in that it may be less time intensive to complete and includes birth defects, while remaining more predictive than birth weight alone.
SNAP II and SNAP-PE II
In 2001, Richardson et al13 published an update to both the SNAP and SNAP-PE. Their updated scores, named SNAP II and SNAP-PE II, sought to yield a more parsimonious approach. The authors used a large population of neonates in New England, California, and Canada, born in 1996 and 1997, to develop and validate their scores. Components in the SNAP were paired down to 6 items, which allowed for a reduced scoring time of 4 minutes or less for data abstractors. In addition, data were collected at 12 instead of 24 hours after admission, to reduce the impact of initial treatments on the score. The updated scores were found to be highly predictive of neonatal morality and were well calibrated to predict mortality among both high- and low-risk populations.
The CRIB II was published in 2003 by Parry et al14 to update the original CRIB. Data were obtained from a UK-wide cohort born in 1998 and 1999. From these data, the original CRIB was evaluated and found to have poor calibration in the more recent clinical cohort. The authors speculated that since the original cohort, surfactant and antenatal steroids had become standard of practice, potentially improving outcomes and altering mortality risk. The CRIB II is based on data obtained within the first hour of life, and does not include any risk adjustment for birth defects. CRIB II was noted to have improved discrimination over the original CRIB score, despite the absence of comorbidity adjustment.
Vermont Oxford Network-Risk Adjustment and Revalidation of SNAP II/SNAP-PE II
The Vermont Oxford Network uses a proprietary means of risk adjustment (VON-RA) to compare outcomes among the network’s >500 centers. The primary study introducing the VON-RA was published in 2007 by Zupancic et al.15 This study was based on data collected in 2002, in which there was an effort to revalidate the SNAP II and SNAP-PE II among 58 participating centers. Further, the investigators sought to evaluate the performance of the VON-RA in comparison with these scores and determine the relative contribution of birth defects to the performance of the VON-RA. Importantly, this study found that the addition of birth defects to the SNAP-PE II score significantly improved the score’s performance. In addition, the study demonstrated that the VON-RA performed similarly to the SNAP-PE II + birth defects. This finding is particularly important given that the VON-RA does not use clinical measurements (blood pressure, pH).
Agency for Healthcare Research and Quality Neonatal Quality Indicator #2
The Agency for Healthcare Research and Quality (AHRQ) created a neonatal mortality quality measure using hospital administrative data, and no physiologic data, to compare institutional outcomes. Data are obtained from hospital discharge abstracts, compiled by the agency through a collaborative national effort with most states. AHRQ excludes neonates weighing <500 g, with trisomy 13 or 18, or with a diagnosis of polycystic kidney disease or anencephaly. Their risk adjustment scheme includes gender, birth weight, birth defects, transfer status, modified diagnosis-related group (mechanical ventilation >96 hours, major cardiac procedures), and major diagnostic classifications (disorders of the nervous system, disorders of the respiratory system, disorders of the cardiovascular system, disorders of the digestive system, and conditions originating in the neonatal period). Both major diagnostic classifications and modified diagnosis-related groups are broad inclusive categories that include both procedures and diagnoses. To date, this method of risk adjustment has not been validated in the literature.16
NICHD 2008 and the “Calculator”
Although each previous method of risk adjustment acknowledged lack of precision with individual patients and cautioned against use in the care of individual patients, Tyson et al17 sought to provide data to clinicians and parents that might influence decisions to resuscitate newborns who were extremely premature. The authors sought to evaluate outcomes at the extremes of gestational age, including those born at 22 weeks’ gestation. Data were obtained from 19 NICHD Neonatal Research Network centers for infants born between 22 and 25 weeks between 1998 and 2003. Extremes of birth weight (<401 g and >1000 g) were excluded, as were newborns with birth defects. This cohort of newborns was followed until 18 to 22 months and neurodevelopment assessments were performed. Gestational age, birth weight, gender, exposure to antenatal steroids, and single versus multiple births were chosen a priori as predictors of death or poor neurodevelopmental outcome. The established model demonstrated superiority over gestational age alone in predicting outcomes at 18 to 22 months. The authors published an interactive “calculator” (http://www.nichd.nih.gov/about/org/cdbpm/pp/prog_epbo/epbo_case.cfm) to aid clinicians and parents in their decision-making.17 It should be noted that antenatal steroids is itself a process of care and including it complicates comparisons of institutional variations in care; however, this does not diminish the calculator’s usefulness as a clinical tool.
Patterns of Citation in the Literature
Approaches to mortality risk adjustment in the NICU setting have been facilitators of widespread research in neonatal care and quality improvement over the past 2 decades. Beginning in the mid-1990s, almost all of the approaches described previously have been cited, with CRIB (447) and SNAP (325) being cited most frequently overall (Fig 1). Adjusted for number of years since publication, the NICHD calculator has been most widely referenced, with a mean of 37 citations per year (Fig 2).
Robust means of risk adjustment are vital to efforts to measure variations in care delivery. Most scores reviewed here are used to compare patients in research studies, but some aim to provide institutional feedback on outcomes, and the NICHD calculator aims to guide treatment decisions in new ways that have garnered widespread attention in the clinical literature and practice. Each score provided comparisons that were superior to raw institutional data without risk adjustment and birth weight alone, in terms of partitioning variance associated with patient characteristics from variance attributed to care delivery processes.
Given the variation in the underlying purpose for the creation and use of these scores, their impact also differs. Some have been used to direct subsequent care and therefore “become” part of the process of care. Others have been designed or used to ensure homogeneity in populations of infants undergoing clinical trials of therapy to help isolate effects of a single process of care. However, if the goal of these measurements is to identify true heterogeneity in practice and lead to improvement in care, then scores should include a description of the population at risk (including birth defects), as well as the environment in which care is being rendered (eg, presence of specific forms of medical technology (extracorporeal membrane oxygenation), specific personnel (pediatric neurosurgery), and so forth. Adding more elements must not overcomplicate the scores or create multicollinearity, however; rather, the trend over time is toward more parsimonious measures that are straightforward in their applications.
Importantly, the evolution of the SNAP, SNAP-PE, and CRIB scores over time serves as a reminder that risk adjustment must change as the needs and circumstances of the clinical population change. For example, as treatment improves with innovations, such as surfactant, the underlying risk of mortality changes and risk adjustment approaches must be recalibrated to reflect higher expectations of care outcomes. This iterative approach is evident in the VON-RA. Other domains of score evolution, from research to the bedside and from in-hospital mortality to clinical outcomes within the first 2 years of life, are apparent with the recent development and attention given to the NICHD calculator.
One of the most salient differences among scores in this review is that there is substantial variation in the timing of data collected for each score. As an example, SNAP includes data obtained within the first 24 hours of admission, whereas VON-RA includes postdischarge data. Because of this, each score serves a unique purpose and provides a risk assessment at varying times along the treatment course. Current risk scores focus chiefly on initial risk at time of admission and a summary of risk at discharge. Perhaps future risk scores could include estimates of day-specific risk (eg, after initial clinical stabilization) and summary risk scores much the way the Charlson comorbidity index functions for adults.18 Further development of risk adjustment in this fashion would serve to refine the ability to account for patient variation in research and quality improvement efforts and may permit risk stratification of NICU graduates to guide their future care in high-risk NICU follow-up clinics.
To this point, however, the literature has not provided guidance about the degree to which data gathered at different time points will yield varying, versus consistent, risk adjustment in a single cohort. Stated another way, there is a need today for rigorous comparative effectiveness analysis of mortality risk adjustment in the NICU setting. Comparative effectiveness studies in NICU care have not been a focus to date of comparative effectiveness efforts led by the Institute of Medicine.19
As medicine and technology advances, obtaining the most parsimonious score may not be necessary, as perceived in the past. The future of risk adjustment may allow for integration in the electronic medical record systems, bringing together available data to provide timely and robust assessments of mortality risk. Simplicity for the sake of decreasing the intensity of resources required for measurement may no longer be necessary; in fact, limiting data collection may lead to exclusion of predictive parameters.
Last, the most recent literature seems clear that inclusion of birth defects is important in risk adjustment.15 For institutional comparisons, this might allow for more adequate comparisons between NICUs with a relatively homogenous preterm population to those with a more heterogeneous population. Additionally, inclusion of birth defects might allow for broader inclusion in research studies, instead of excluding this population from derivation and validation of future score iterations.
As a review of the literature, our article is limited by the potential errors and biases included in each cited study. This review is also limited by publication bias, although we attempted to account for this by searching nonpublished sources (eg, Google). Although we attempted to use relevant search and MESH terms, errors of omission are also possible.
In addition, we wished to emphasize the ease of applicability and use of scores across institutions, and therefore we included studies only if they published coefficients that would enable future use by researchers. This criterion did exclude studies that otherwise address challenging issues in risk adjustment, such as racial disparities in neonatal mortality.20 We also limited our review to studies validated in US and UK populations; generalization to populations in other industrialized nations, and to developing country settings, must be done with caution.
Continuing to develop and enhance rigorous means of risk adjustment in the NICU is critical to improving care delivered to neonates, by facilitating meaningful comparisons in quality improvement. Building on the first 20 years of neonatal mortality risk adjustment will permit future health care providers to serve patients’ needs better and ultimately allow researchers and quality improvement teams to develop and use measures that facilitate cross-institutional comparisons thoroughly and fairly.
The authors would like to think Michelle Housey, MPH, for her contributions to this project.
- Accepted December 20, 2012.
- Address correspondence to Stephen Patrick, MD, MPH, MSc, 8-621 C&W Mott Hospital, 1540 E. Medical Center Dr, Ann Arbor, MI 48109-4254; E-mail:
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
- Copyright © 2013 by the American Academy of Pediatrics