Skip to main content

Advertising Disclaimer »

Main menu

  • Journals
    • Pediatrics
    • Hospital Pediatrics
    • Pediatrics in Review
    • NeoReviews
    • AAP Grand Rounds
    • AAP News
  • Authors/Reviewers
    • Submit Manuscript
    • Author Guidelines
    • Reviewer Guidelines
    • Open Access
    • Editorial Policies
  • Content
    • Current Issue
    • Online First
    • Archive
    • Blogs
    • Topic/Program Collections
    • AAP Meeting Abstracts
  • Pediatric Collections
    • COVID-19
    • Racism and Its Effects on Pediatric Health
    • More Collections...
  • AAP Policy
  • Supplements
  • Multimedia
    • Video Abstracts
    • Pediatrics On Call Podcast
  • Subscribe
  • Alerts
  • Careers
  • Other Publications
    • American Academy of Pediatrics

User menu

  • Log in
  • Log out
  • My Cart

Search

  • Advanced search
American Academy of Pediatrics

AAP Gateway

Advanced Search

AAP Logo

  • Log in
  • Log out
  • My Cart
  • Journals
    • Pediatrics
    • Hospital Pediatrics
    • Pediatrics in Review
    • NeoReviews
    • AAP Grand Rounds
    • AAP News
  • Authors/Reviewers
    • Submit Manuscript
    • Author Guidelines
    • Reviewer Guidelines
    • Open Access
    • Editorial Policies
  • Content
    • Current Issue
    • Online First
    • Archive
    • Blogs
    • Topic/Program Collections
    • AAP Meeting Abstracts
  • Pediatric Collections
    • COVID-19
    • Racism and Its Effects on Pediatric Health
    • More Collections...
  • AAP Policy
  • Supplements
  • Multimedia
    • Video Abstracts
    • Pediatrics On Call Podcast
  • Subscribe
  • Alerts
  • Careers

Discover Pediatric Collections on COVID-19 and Racism and Its Effects on Pediatric Health

American Academy of Pediatrics
Article

A Comparison of Neonatal Mortality Risk Prediction Models in Very Low Birth Weight Infants

Murray M. Pollack, Matthew A. Koch, Doris A. Bartel, Irina Rapoport, Ramasubbareddy Dhanireddy, Ayman A. E. El-Mohandes, Kenneth Harkavy, K. N. Siva Subramanian and the District of Columbia Neonatal Network
Pediatrics May 2000, 105 (5) 1051-1057; DOI: https://doi.org/10.1542/peds.105.5.1051
Murray M. Pollack
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matthew A. Koch
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Doris A. Bartel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Irina Rapoport
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ramasubbareddy Dhanireddy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ayman A. E. El-Mohandes
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kenneth Harkavy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
K. N. Siva Subramanian
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
  • Comments
Loading
Download PDF

Abstract

Background. Risk-adjusted severity of illness is frequently used in clinical research and quality assessments. Although there are multiple methods designed for neonates, they have been infrequently compared and some have not been assessed in large samples of very low birth weight (VLBW; <1500 g) infants.

Objectives. To test and compare published neonatal mortality prediction models, including Clinical Risk Index for Babies (CRIB), Score for Neonatal Acute Physiology (SNAP), SNAP-Perinatal Extension (SNAP-PE), Neonatal Therapeutic Interventions Scoring System, the National Institute of Child Health and Human Development (NICHD) network model, and other individual admission factors such as birth weight, low Apgar score (<7 at 5 minutes), and small for gestational age status in a cohort of VLBW infants from the Washington, DC area.

Methods. Data were collected on 476 VLBW infants admitted to 8 neonatal intensive care units between October 1994 and February 1997. The calibration (closeness of total observed deaths to the predicted total) of models with published coefficients (SNAP-PE, CRIB, and NICHD) was assessed using the standardized mortality ratio. Discrimination was quantified as the area under the curve (AUC) for the receiver operating characteristic curves. Calibrated models were derived for the current database using logistic regression techniques. Goodness-of-fit of predicted to observed probabilities of death was assessed with the Hosmer-Lemeshow goodness-of-fit test.

Results. The calibration of published algorithms applied to our data was poor. The standardized mortality ratios for the NICHD, CRIB, and SNAP-PE models were .65, .56, and .82, respectively. Discrimination of all the models was excellent (range: .863–.930). Surprisingly, birth weight performed much better than in previous analyses, with an AUC of .869. The best models using both 12- and 24-hour postadmission data, significantly outperformed the best model based on birth data only but were not significantly different from each other. The variables in the best model were birth weight, birth weight squared, low 5-minute Apgar score, and SNAP (AUC = .930).

Conclusion. Published models for severity of illness overpredicted hospital mortality in this set of VLBW infants, indicating a need for frequent recalibration. Discrimination for these severity of illness scores remains excellent. Birth variables should be reevaluated as a method to control for severity of illness in predicting mortality.

  • severity of illness index
  • neonatal mortality
  • risk adjustment
  • infant
  • very low birth weight
  • intensive care
  • neonatal

Severity of illness assessment has been important for a wide range of pediatric, neonatal, and adult uses including quality assessments, controlling for severity of illness in clinical studies, and studies of resource utilization and management.1–3National programs such as the ORYX initiative by the Joint Commission on Accreditation of Hospitals and Related Organizations are premised on the condition of controlling evaluations for severity of illness.4 Although severity of illness is a familiar medical concept, it is sometimes difficult to assess. In the context of intensive care, a rational and objective way to define and quantify severity of illness is through the development of probabilistic models predicting mortality risk.5 Such predictive models have been developed for all age groups.6–11

Severity measurements in neonatal intensive care have traditionally used birth weight and Apgar scores, but the relationship between mortality and these parameters has been insufficiently precise to use for quality assessment. Further, the rapid evolution of neonatal care has made the relationships between mortality and these variables unstable. In 1993, the Score for Neonatal Acute Physiology (SNAP),9 the SNAP-Perinatal Extension (SNAP-PE),10 and the Clinical Risk Index for Babies (CRIB)11 scores were proposed for use in assessing severity with sufficient precision to allow an expansion of the applications to include quality assessment. SNAP uses the worst recorded values of more than 2 dozen routinely measured physiologic variables during the first 24 hours of stay; SNAP-PE supplements the SNAP with additional scoring for birth weight, small for gestational age (SGA) status, and low Apgar score (<7 at 5 minutes). CRIB uses information on base excess and oxygen requirements during the first 12 hours of life, as well as birth weight, gestational age, and congenital malformations. Although these models have been shown to perform better than birth weight alone in predicting hospital mortality, each suffers from certain limitations. SNAP and SNAP-PE were developed with a relatively small representation of very low birth weight (VLBW; <1500 g) infants, the group in which most deaths occur and most medical advancements have been made. CRIB was developed before widespread use of surfactant and is highly dependent on respiratory status. The Neonatal Therapeutic Intervention Scoring System (NTISS) assesses severity based on the intensity of therapeutic intervention during the first 24 hours of stay, using information on 62 specific therapeutic interventions. Finally, a model dependent on variables collected before admission to the neonatal intensive care unit (NICU) has been proposed by the National Institute of Child Health and Human Development (NICHD) neonatal research network.8 Although this NICHD model, based on birth weight, SGA status, gender, race (black vs other), and 1-minute Apgar ≤3, has the advantage of better separating therapy from severity of illness measurement, the published performance has been substantially worse than the CRIB or SNAP models.

We evaluated the relative ability of severity measures to discriminate between hospital deaths and survivors in a large group of VLBW infants admitted to 8 NICUs in Washington, DC. Because most neonatal mortality occurs in this group, assessment of quality using measures such as standardized mortality ratios (SMRs) must depend on reliable predictors. The measures, examined singly or in combination using models developed from our data, included birth weight, low Apgar score (<7 at 5 minutes), the NICHD neonatal network's preadmission score, CRIB, SNAP, SNAP-PE, and NTISS. Because separating severity assessment from therapy as much as possible is beneficial in quality assessment models, we also tested the performance of the SNAP and SNAP-PE scores derived from the first 12 hours of care. These analyses address the tradeoffs in discriminatory power with respect to issues of complexity and duration of scoring. We also compared our mortality experience to that predicted by published logistic regression equations for the NICHD neonatal network's preadmission model, SNAP-PE, and CRIB to test whether they are sufficiently calibrated in VLBW infants.

METHODS

In 1992, NICHD, in conjunction with the National Institutes of Health, Office of Research on Minority Health and the National Institute for Nursing Research, sponsored the cooperative agreement: “The National Institutes of Health DC Initiative to Reduce Infant Mortality in Minority Populations in the District of Columbia.” The purpose of the 5-year initiative was to develop coordinated projects designed to better understand the reasons for the high rate of infant mortality in the District of Columbia and to design and evaluate intervention projects aimed at reducing the number of infants in the District of Columbia at increased risk of dying in their first year of life. As part of this initiative, 9 of the 10 NICUs in the District of Columbia caring for low birth weight infants agreed to study the various aspects of neonatal intensive care. Eight of the NICUs provided acute care, and the ninth provided chronic care only. All protocols were approved by the institutional review boards of each site, as well as by the National Institutes of Health.

Eligibility criteria for inclusion in the study included the following: 1) live birth, with birth weight from 500 to 1499 g; 2) birth to a resident of the District of Columbia; 3) care in a participating NICU or intermediate/step-down unit; and 4) date of birth and date of first admission to a participating NICU during the enrollment period, October 1, 1994 to February 19, 1997. These criteria implicitly excluded delivery room deaths. Exclusion criteria included: 1) neonates not expected to survive attributable to extreme prematurity (only comfort care given) or inevitably lethal malformations; 2) neonates admitted after discharge to home; 3) infants who died before 2 hours of age; and 4) infants transferred from a nonparticipating hospital after 24 hours of age.

Data were collected prospectively from November 13, 1995 through April 30, 1997, and retrospectively from October 1, 1994 through November 12, 1995. All prospective data were abstracted from the clinical records using a process designated for each site to insure complete data collection. Retrospective data collection used the existing medical records. Clinical data relevant to this report included maternal information (age, socioeconomic information, prenatal care, antenatal steroids, and multiple infants per gestation), infant data (gender, race, gestational age, and congenital malformations), delivery room status (birth weight, SGA status, Apgar scores, and mode of delivery), transportation status, and physiologic data required for the SNAP and CRIB (mean blood pressure, heart rate, respiratory rate and temperature, blood gas data, blood chemistries and blood counts, urine output, presence of seizures, apnea, stool guaiac, base excess, and minimum and maximum fraction of inspired oxygen [Fio2] values). In addition, therapeutic and monitoring data for the NTISS were collected for the first 24 hours.12 Physiologic and NTISS data were collected separately for the first and second 12 hours after admission. Physiologic measurements in the 2 hours before death were excluded from data collection for the SNAP score. Transfers between participating NICUs were tracked to provide a complete record of the birth hospitalization until the occurrence of either in-hospital death or discharge to home, non-NICU chronic care, or a nonparticipating hospital. If infants reached boarder status (ie, retained in the hospital for nonmedical reasons), they were considered discharged. Both daily and outcome data collection for the entire project were more extensive.

Before data collection, each site was visited and general and site-specific data collection procedures were developed, including a detailed, site-specific operations manual. In addition to the project's principal investigator, project coordinator, field manager, and data collectors, each site had a staff nurse assigned as a facilitator to help with data collection and a staff physician to serve as a site liaison to the principal investigator. Data collectors were registered nurses who were oriented in depth to the data-recording methods at each site. Each research nurse covered multiple sites to ensure consistency in data collection during periods of absence of the other data collectors. Data were collected directly onto computers, which applied specific immediate error checks and transmitted via modem to the data management center each evening. The completeness of the sample was assessed using the NICU and delivery room logs.

Excluded from this analysis were infants who were transferred to or among participating hospitals during the first 24 hours of life, because the protocol's data collection forms precluded assigning components of the severity of illness scores in less than 12- or 24-hour blocks. This effectively excluded outborn infants and reduced the number of acute care NICUs where infants originated to 7. The characteristics of these 7 NICUs are shown in Table 1. Excluded from the analysis of hospital mortality were infants who were missing any of the severity of illness measures. Infants who were transferred to a nonparticipating hospital were assumed to have survived the birth hospitalization.

View this table:
  • View inline
  • View popup
Table 1.

NICUs (n = 7)

The different methods proposed for calculating neonatal severity of illness, including the NICHD network score based on preadmission variables,8 the SNAP score in the form of the SNAP-PE (best-fit model),10 and the CRIB score13 have published formulas that predict mortality. To assess how well each of these formulas actually predicted total mortality (ie, calibration) in our sample, we computed the ratio of the observed to the predicted total number of hospital deaths and, using a z test,14,,15examined whether this standardized mortality ratio (SMR) was significantly different from 1. It is also important to assess the ability of the individual severity of illness scores to distinguish between neonates who are likely to die in the hospital from those likely to survive (ie, discrimination). To do this, one can calculate for each score the true-positive rate (of the total number of hospital deaths, the proportion identified to be at high risk) and the false-positive rate (of the total number of survivors, the proportion identified to be at high risk). The magnitude of these quantities depends on a threshold value for the scores that define high risk. A less stringent threshold value would increase sensitivity but decrease specificity. This is illustrated by receiver operating characteristic (ROC) curves, which plot the true-positive rate against the false-positive rate for different threshold values. Points on the diagonal line from (0,0) to (1,1) indicate an equal chance of being labeled high risk for both deaths and survivors. The higher the ROC curve is from this chance line, the better the discrimination. The area under the ROC curve (AUC) can be viewed as a quantitative measure of this discrimination, which ranges from .5 for a chance discrimination to 1.0 for perfect discrimination. The AUC can also be interpreted as the probability that a randomly selected hospital death has a higher severity score than a randomly selected hospital survivor. We assessed the discriminatory ability of individual scores by calculating their AUCs, a measurement that is model-free and depends only on the ability of the scores to rank infants in increasing likelihood of hospital death. We also calibrated each score to our data by fitting a logistic regression with the score as the single predictor variable. The estimation of the best intercept and slope in the logistic regression ensures that the predicted total number of deaths in the sample is the same as the observed total.

However, the regression model may still systematically overpredict for some illness severities and underpredict for others. We applied the Hosmer-Lemeshow goodness-of-fit test (PROC LOGISTIC, SAS Institute, Cary, NC)16,,17 to determine whether there was systematic underestimation or overestimation of mortality depending on the severity of illness. The Hosmer-Lemeshow test does this by comparing the observed number of deaths with the predicted number of deaths within each of 10 approximately equally sized subgroups of infants, as ordered by the predicted probabilities of death. Ruttimann18 provides a useful review of the concepts of logistic regression, calibration, discrimination, ROC analysis and AUCs, and predictive accuracy/goodness-of-fit.

To explore the performance of combinations of severity measures, we assessed their discrimination (AUC) and predictive accuracy (Hosmer-Lemeshow goodness-of-fit test) using logistic regression models. Because the sample size was insufficient to validate the various models on an independent sample, the analysis focused on predictors and their combinations that could be used to control for severity of illness in this sample. A model for each predictor (birth weight, NICHD preadmission score, CRIB, SNAP, SNAP-PE, and NTISS) was fit using an intercept and a linear term; nonlinearity was tested through the addition of a quadratic term. Significant (P < .05) quadratic terms were retained unless they became nonsignificant in subsequent modeling. Terms for birth weight, low Apgar score (<7 at 5 minutes, after the formulation in the SNAP-PE), and SGA were added to each model unless their information was already contained in the score (ie, NICHD preadmission score, CRIB, and SNAP-PE). These additional terms were then removed as indicated by backward elimination using P < .05 to remain in the model. We compared the AUCs from the different models by using a test that makes few model assumptions and is valid for correlated AUCs.19

Descriptive statistics are reported as means ± standard deviation, medians, and proportions. All P values for single parameters are 2-sided.

RESULTS

The cohort consisted of 552 infants. Exclusions for inevitably lethal malformations (n = 0), infants given comfort care only (n = 10), deaths before 2 hours of life (n = 2), transfers from nonparticipating sites after 24 hours of age (n = 10), and missing charts (n = 8; no deaths) left a total study population of 522 infants. In addition, 46 infants who were transferred into or among the participating hospitals in the first 24 hours of life were excluded from the data analysis, leaving 476 infants born to 450 women. Of these, 250 infants were in the prospective cohort and 226 in the retrospective cohort. All infants were admitted to the NICU within 2.4 hours of birth (mean: .37 ± .30 hours). Complete follow-up data were available for 465 infants; 6 infants discharged to nonparticipating hospitals at ages ranging from 6 to 88 days of life were assumed to have survived, as were 5 infants with ages ranging from 85 to 159 days who were still in the hospital when the study ended. Eight infants (1 death) were excluded from the mortality analysis because missing values for 1 or more Apgar scores precluded the calculation of the NICHD score or the SNAP-PE. For the construction of the CRIB score, minimum and maximum appropriate Fio2 values were unavailable for 52 infants because none of the blood gases had a partial pressure of arterial oxygen in the appropriate range (50–80 mm Hg).

One of the investigators (M.M.P.) assigned point scores for minimum and maximum appropriate Fio2 based on clinical assessment of the pattern of partial pressure of arterial oxygen and Fio2 values for available blood gas measurements. Alternative analyses (data not shown) that excluded infants with missing CRIB components gave qualitatively similar results to those reported below. Race was not recorded for 7 infants; given the racial composition of our population, we computed the NICHD score for these infants as if they were black.

The mothers were mostly black (90.3%) and single (78.9%); 13.4% were <20 years old, and 19.5% had received no prenatal care. Antenatal steroids were received by 71.2% of mothers. Infant characteristics are shown in Table 2. The infants had a mean birth weight of 1048 ± 285 g with a mean gestational age of 28.6 ± 2.8 weeks, and 20.4% were categorized as SGA. The distributions of weight and gestational age are also shown in Table 2. Almost half of the neonates were delivered via caesarian section. Low Apgar scores were common, with 27.9% having Apgar scores ≤3 at 1 minute, and 24.8% with Apgar scores <7 at 5 minutes.

View this table:
  • View inline
  • View popup
Table 2.

Infant Descriptive Characteristics (n = 476)

The severity of illness data are shown in Table 3. The mean 1-minute and 5-minute Apgar scores were 4.9 ± 2.3 and 7.3 ± 1.8, respectively. The mean CRIB score was 5.2 ± 5.1. The intensity of therapy and monitoring as measured by the NTISS was 19.5 ± 7.4. By definition, most of the components of SNAP and SNAP-PE scores for the first 12 hours cannot exceed their 24-hour counterparts, and this is reflected in the mean difference between the 12-hour and 24-hour scores (SNAP: 11.8 ± 7.6 vs 13.5 ± 8.3; SNAP-PE: 23.6 ± 19.5 vs 25.3 ± 20.2). The mean (24-hour) SNAP and SNAP-PE scores (13.5 and 25.3) were similar to previously reported mean scores (13.9 and 24.6) for infants weighing <1500 g.20

View this table:
  • View inline
  • View popup
Table 3.

Severity of Illness Measures (n = 476)

The preestablished predictive equations significantly overpredicted hospital mortality in this sample (Table 4). There were 66 observed hospital deaths. In contrast, the preadmission model proposed by the NICHD network predicted 102.1 deaths (SMR = .65; 95% confidence interval [CI] = .50,.79; P < .0001), the CRIB model predicted 117.3 deaths (SMR = .56; 95% CI = .46,.67;P < .0001), and the SNAP-PE model predicted 80.2 deaths (SMR = .82; 95% CI = .68,.97; P = .017).

View this table:
  • View inline
  • View popup
Table 4.

Performance of Published Neonatal Mortality Risk Predictors (n = 468; Observed Deaths = 66)

Selected results for discrimination and goodness of fit are shown inTable 5. Birth weight alone was a very good discriminator (AUC = .869) but the goodness-of-fit test was poor, with numbers of observed deaths deviating significantly from predicted probabilities (P = .0005). However, the addition of a quadratic birth weight term achieved an excellent fit at the cost of a slight decrease in discrimination. No other quadratic terms were retained, and SGA was not predictive in this sample. The term for Apgar score <7 at 5 minutes was significant in every model for which it was included. Adding this term improved the performance of the quadratic birth weight model to an AUC of .892, also with an excellent goodness-of-fit. This AUC for birth weight (its quadratic term and Apgar score <7 at 5 minutes) was equivalent to the performance of the CRIB score and was higher than the AUC for the preadmission score proposed by the NICHD network, even when enhanced by adding the 5-minute Apgar score term.

View this table:
  • View inline
  • View popup
Table 5.

Discriminatory Ability and Goodness-of-Fit for Fitted Models (n = 468)

Figure 1 illustrates the fit of the linear (birth weight alone) and quadratic (birth weight and birth weight squared) models to the observed mortality, by deciles of birth weight. The model with birth weight alone overpredicted mortality for the second through fifth deciles (birth weights between 631 and 1070 g) and tended to underpredict for the remainder. The improvement in fit attributable to the inclusion of the birth weight-squared term is clear, especially for the lower birth weights where most of the deaths occur.

Fig. 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 1.

Observed and predicted mortality by birth weight decile groups.

Within the group of models employing 12-hour data, ∼.01 increments in the AUC were achieved between CRIB, CRIB enhanced with the low 5-minute Apgar term, the 12-hour SNAP-PE, and the best discriminating 12-hour model that included quadratic birth weight, low Apgar, and 12-hour SNAP (AUC = .920). All models displayed satisfactory goodness-of-fit, as did the models using 24-hour data. The AUC for 24-hour SNAP-PE (.916) was similar to that of the best 12-hour model, and the best 24-hour model (quadratic birth weight, low Apgar, and 24-hour SNAP) achieved an AUC of .930. Models containing NTISS performed slightly less well than did those with 24-hour SNAP.

There were no significant differences among AUCs within the groups (birth data, birth plus 12-hour data, and birth plus 24-hour data) shown in Table 5, although the difference in AUCs for CRIB versus the best discriminating 12-hour model approached significance atP = .056. Table 6displays P values for the comparison of AUCs among predefined severity of illness measures. The AUCs for the 12-hour and 24- hour SNAP-PE scores were significantly different from those for birth weight (P = .016 and .0050) and the NICHD score (P = .0043 and .0019), but were not significantly different from the AUC for CRIB (P = .30 and .088) or each other (P = .21). The AUC for CRIB did not significantly differ from that of birth weight (P = .29) or the NICHD score (P = .35).

View this table:
  • View inline
  • View popup
Table 6.

P Values for Comparison of AUCs Among Severity of Illness Measures (n = 468)

Figure 2 shows the empirical ROC curves for birth weight alone and for the best discriminating birth, 12-hour, and 24-hour models. The curves are similar for the best 12- and 24-hour models, which use SNAP in addition to quadratic birth weight and Apgar score <7 at 5 minutes. Both dominate the curves for the models based on birth data only. Table 7 providesP values for the comparison of AUCs among these fitted models. The best discriminating 12- and 24-hour models had significantly better AUCs than the models using birth weight alone (P = .010 and .0056), or quadratic birth weight and Apgar score <7 at 5 minutes (P = .058 and .035), but did not differ significantly from each other (P = .20).

Fig. 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 2.

ROC curves for hospital mortality for birth weight and best discriminatory models using birth, 12-hour, and 24-hour data (n = 468).

View this table:
  • View inline
  • View popup
Table 7.

P Values for Comparison of AUCs Among BW and Best Discriminatory Models Using Birth, 12-Hour, and 24-Hour Data (n = 468)

DISCUSSION

Neonatal severity of illness measures have several important uses. First, they are valuable in comparing outcomes across hospitals or NICUs. Accurate and reliable measures of severity of illness are required for unbiased and reliable comparisons such as those involving benchmarking or comparative quality of care studies. Second, they can serve to control for population differences when performing studies such as clinical trials, outcome evaluations, and evaluations of resource utilization. Severity measures must perform satisfactorily if they are to fulfill these roles. Interinstitutional comparisons and benchmarking are especially sensitive to the performance of the severity of illness measures because comparisons of the number of observed outcomes to the number of predicted outcomes are central to the interpretation of the institutional performance. Mortality is generally the outcome used to validate the severity of illness measure because mortality is clearly defined and occurs with sufficient frequency for use in prediction models.

Models developed to measure neonatal severity of illness have improved in calibration and discriminatory power. Yet questions remain concerning the applicability and validity of these methods. Are they applicable to all neonatal populations? For example, scores have been developed from specific geographical regions. The proportion of low birth weight infants and extremely low birth weight infants differed in each of the populations used to develop the scores. The rapid medical evolution in perinatal and neonatal care could have altered the relationship between mortality and the score's predictor variables. Measurement bias may be present because most scores require the collection of a broad spectrum of physiologic measurements that may not be routinely collected in each NICU at the same level of intensity. Lead-time bias issues may be present attributable to transport issues and different practice patterns of care given in the delivery room. Each of these issues has the potential to limit the utility of these instruments.21

We compared the calibration and discriminatory power of existing severity of illness scales with respect to hospital mortality in VLBW infants in a contemporary urban population. Our assessments were substantially more detailed than other assessments of their performance,22–24 providing comparisons of discrimination and goodness-of-fit for all well-established indices of neonatal severity of illness and important birth variables, as well as calibration of published prediction equations. Of note, all existing prediction models were poorly calibrated for this population. Each substantially overpredicted the number of deaths by at least 20%, with standardized mortality ratios ranging from .56 to .82. Some of the issues detailed above could account for this poor performance. Specifically, this study population received care in an environment of widespread surfactant (88%) and antenatal steroid (71%) use, both of which have significantly reduced neonatal mortality in VLBW infants.25,,26 Such differences may have challenged the calibration of previously developed illness severity models for hospital death. Additionally, the population was predominantly urban black (90.3%) infants, who have lower neonatal mortality rates at each individual low birth weight category.27 Despite this, the NICHD score, which includes race as a component, did not perform significantly better than other models. Its development among an inclusive population of VLBW newborns using preadmission factors may have contributed to its poor performance in this population, which was intentionally limited to NICU admissions, and excluded infants who were only given comfort care and early (<2 hour) deaths.

Surprisingly, relatively simple predictors discriminated extremely well in this population. The single variable of birth weight achieved very good discrimination (AUC = .869). Previous studies8,,10,13,22,24 have not found this excellent result, with AUCs for birth weight alone ranging from .72 to .83. Goodness-of-fit for our associated logistic model was poor (P = .0005). Similar lack of fit for birth weight alone has been found previously.8 The addition of a quadratic birth weight term achieved satisfactory goodness-of-fit (P = .89) with little loss in discrimination (AUC = .863). Previous research conducted in the context of all birth weights also indicated that the logistic relationship for birth weight is nonlinear.10 Including Apgar score <7 at 5 minutes as an additional predictor increased the AUC to .892, a value comparable to the CRIB score and better than the score developed by the NICHD network. Similar discriminatory power (AUC = .87) was previously observed for a birth data model incorporating birth weight, gestational age, and 5-minute Apgar score.24 However, none of the differences in AUCs among birth data models reached statistical significance.

Little discriminatory ability was lost when using the 12-hour version of SNAP-PE, compared with its 24-hour counterpart, and the difference was not statistically significant. Using different criteria, a qualitatively similar conclusion has recently been published.23 This small loss should be contrasted with the potential benefit of reducing the influence of therapy on the score. Among 12-hour predictors, the AUC for CRIB was somewhat lower than that of the 12-hour SNAP-PE but did not differ significantly. This loss should be contrasted with the advantage in simplicity of the CRIB. The discriminatory performance of CRIB, however, did not differ significantly from that of birth weight or the NICHD score. Confirmation of the intermediate performance of CRIB, as opposed to a true performance more similar to birth scores or to 12-hour SNAP-PE, would require a larger sample.

Similar patterns were seen for the best discriminating models: 12-hour and 24-hour best models (quadratic birth weight, low Apgar at 5 minutes, and relevant version of SNAP) had significantly higher AUCs than the best birth data model (SNAP removed) but did not differ significantly from each other. Because these models use coefficients derived from the current data to weight the contributions of individual terms, AUCs for these models may be somewhat overestimated compared with their performance in a validation sample.28Consequently, comparisons between these models and the performance of the predefined severity of illness measures may be overstated to some extent and should be interpreted with caution.

Severity of illness methods are now commonly used in intensive care for children and adults. Adjustments for severity of illness as well as other case-mix variables are required for comparisons of mortality rates and efficiency of care measures. Our results suggest that use of physiologic variables may not be necessary in adjusting for severity of illness in VLBW infants. Relatively simple and commonly available data including birth weight and Apgar scores should be reexamined for their utility in these endeavors.

ACKNOWLEDGMENTS

This work was supported in part by a grant from the National Institute of Child Health and Human Development and the National Institutes of Health, Office of Research on Minority Health.

The study could not have been completed without the collaboration of the District of Columbia Neonatal Network, whose members include: Murray M. Pollack, MD; Billie Short, MD; Kantilal M. Patel, PhD; Julie Ziegler, MA; Joyce Williams, RN; Doris Bartel, MSN, from Children's National Medical Center; Kenneth Harkavy, MD, from the Columbia Hospital for Women; Michal Young, MD, from DC General Hospital; Annette Heiser Ficker, MD, from Hospital for Sick Children; Fariborz Rahbar, MD, and Davene White, RN, NNP, from Howard University Hospital; Ayman A. E. El-Mohandes, MD, from George Washington University Medical Center; K. N. Siva Subramanian, MD, and Ramasubbareddy Dhanireddy, MD, from Georgetown University Medical Center; Maria Paz Ruiz, MD, from Providence Hospital; John P. Grausz, MD, from Washington Hospital Center; Nancy Taplin McCall, ScD, from Health Economics Research, Inc; and Vijaya Rao, PhD, and Matthew A. Koch, MD, PhD, from Research Triangle Institute.

We gratefully acknowledge the significant contributions of the following people: LaSonji Holman, Margaret Crosby, Gloria Seymour, Priscilla Johnson, Linda Font, Karol Duffy, and Kate Collins Wooddell for their work on abstracting the records; Sheilia O'Brien for field management; Patricia Higdon, Jean Gilroy, Eva M. Bell, and Linda L. Ivey as site coordinators; Pamela A. Angelus, Elizabeth Estrada Jarosz, Susan Novosel, Lisa Wright, Chita Taylor, Maria J. Floyd, Mary Ellen Lynch, Judith Stark, Jane Devine as nurse facilitators; Connie L. Hobbs, Donna Hewitt for data form design and manual of operations; Scott E. Schaefer and Margo F. Brinkley for data entry, data management, and reporting systems; Arthur Macaraeg, Mhairi MacDonald, Antoine Fomufod, and Susan McCabe, for the help with the design of the protocol.

This study was part of the National Institutes of Health DC Initiative to Reduce Infant Mortality in Minority Populations in the District of Columbia and was funded by The National Institutes of Health Office of Research on Minority Health and the National Institute of Child Health and Human Development. The following institutions were participants in the National Institutes of Health DC Initiative to Reduce Infant Mortality in Minority Populations in the District of Columbia: Children's National Medical Center: P. Scheidt and M. Pollack (principal investigators); DC Department of Public Health: B. Hatcher (principal investigator); DC General Hospital: L. Johnson (principal investigator); Georgetown University Medical Center: K. N. Sivasubramanian (principal investigator); Howard University: B. Wesley (principal investigator); University of the District of Columbia: V. Melnick (principal investigator); Research Triangle Institute: V. Rao (principal investigator); and National Institute of Child Health and Human Development: H. Berendes (program officer), A. Herman (scientific coordinator), and B. Wingrove (program coordinator).

Footnotes

    • Received January 19, 1999.
    • Accepted August 9, 1999.
  • Reprint requests to (M.M.P.) Department of Critical Care Medicine, Children's National Medical Center, 111 Michigan Ave NW, Washington, DC 20010. E-mail: mpollack{at}cnmc.org

SNAP =
Score for Neonatal Acute Physiology •
SNAP-PE =
SNAP Perinatal Extension •
CRIB =
Clinical Risk Index for Babies •
SGA =
small for gestational age •
VLBW =
very low birth weight •
NTISS =
Neonatal Therapeutic Intervention Scoring System •
NICU =
neonatal intensive care unit •
NICHD =
National Institute of Child Health and Human Development •
Fio2 =
fraction of inspired oxygen •
SMR =
standardized mortality ratio •
ROC =
receiver operating characteristic •
AUC =
area under the ROC curve •
CI =
confidence interval

REFERENCES

  1. ↵
    1. Pollack MM,
    2. Cuerdon TT,
    3. Patel KM,
    4. et al.
    (1994) Impact of quality of care factors on pediatric intensive care unit mortality. JAMA 272:941–946.
    OpenUrlCrossRefPubMed
  2. ↵
    1. Panniers TL
    (1987) Severity of illness, quality of care, and physician practice as determinants of hospital resource consumption. Qual Rev Bull 13:158–165.
    OpenUrl
  3. ↵
    1. Ruttimann UE
    (1996) Variability in duration of stay in pediatric intensive care units: a multinational study. J Pediatr 128:35–44.
    OpenUrlCrossRefPubMed
  4. ↵
    1. The Joint Commission on Accreditation of Healthcare Organizations
    (1997) ORYX: the next evolution in accreditation. Nurs Manage 28:49–54.
    OpenUrlPubMed
  5. ↵
    1. Lemeshow S,
    2. Le Gall J-R
    (1994) Modeling the severity of illness of ICU patients: a systems update. JAMA 272:1049–1055.
    OpenUrlCrossRefPubMed
  6. ↵
    1. Pollack MM,
    2. Patel K,
    3. Ruttimann UE
    (1996) PRISM III: an updated pediatric risk of mortality score. Crit Care Med 24:743–752.
    OpenUrlCrossRefPubMed
  7. ↵
    1. Knaus WA,
    2. Wagner DP,
    3. Draper EA,
    4. et al.
    (1991) The APACHE III prognostic system: risk prediction of hospital mortality for critically ill hospitalized adults. Chest 100:1619–1636.
    OpenUrlCrossRefPubMed
  8. ↵
    1. Horbar, JD, Onstad L, Wright E, et al
    (1993) Predicting mortality risk for infants weighing 501 to 1500 grams at birth: a National Institutes of Health Neonatal Research Network report. Crit Care Med 21:12–18.
    OpenUrlPubMed
  9. ↵
    1. Richardson DK,
    2. Gray JE,
    3. McCormick MC,
    4. Workman-Daniels K,
    5. Goldman DA
    (1993) Score for neonatal acute physiology: a physiologic severity index for neonatal intensive care. Pediatrics 91:617–623.
    OpenUrlAbstract/FREE Full Text
  10. ↵
    1. Richardson DK,
    2. Phibbs CS,
    3. Gray JE,
    4. et al.
    (1993) Birth weight and illness severity: independent predictors of neonatal mortality. Pediatrics 91:969–975.
    OpenUrlAbstract/FREE Full Text
  11. ↵
    1. The International Neonatal Network
    (1993) The CRIB (clinical risk index for babies) score: a tool for assessing initial neonatal risk and comparing performance of neonatal intensive care units. Lancet 342:193–198.
    OpenUrlCrossRefPubMed
  12. ↵
    1. Gray JE,
    2. Richardson DK,
    3. McCormick MC,
    4. Workman-Daniels K,
    5. Goldman DA
    (1992) Neonatal Therapeutic Intervention Scoring System: a therapy-based severity of illness index. Pediatrics 90:561–567.
    OpenUrlAbstract/FREE Full Text
  13. ↵
    1. Scottish Neonatal Consultants' Collaborative Study Group, and the International Neonatal Network 1995
    (1995) CRIB (clinical risk index for babies), mortality, and impairment after neonatal intensive care. Lancet 345:1020–1022.
    OpenUrlCrossRefPubMed
  14. ↵
    1. Flora JD
    (1978) A method for comparing survival of burn patients to a standard survival curve. J Trauma 18:701–705.
    OpenUrlPubMed
  15. ↵
    1. Hosmer D,
    2. Lemeshow S
    (1995) Confidence interval estimates of an index of quality performance based on logistic regression models. Stat Med 14:2161–2172.
    OpenUrlCrossRefPubMed
  16. ↵
    SAS Institute. SAS/STAT Software: Changes and Enhancements Through Release 6.12. Cary, NC: SAS Institute; 1997
  17. ↵
    Hosmer D, Lemeshow S. Applied Logistic Regression. New York, NY: John Wiley and Sons; 1989
  18. ↵
    1. Ruttimann UE
    (1994) Statistical approaches to development and validation of predictive instruments. Crit Care Clin 10:19–35.
    OpenUrlPubMed
  19. ↵
    1. DeLong ER,
    2. DeLong DM,
    3. Clarke-Pearson DL
    (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44:837–845.
    OpenUrlCrossRefPubMed
  20. ↵
    1. Richardson DK,
    2. McCormick MC,
    3. Gray JE,
    4. Goldmann DA
    (1994) CRIB and SNAP. Lancet 344:124–125.
    OpenUrlCrossRefPubMed
  21. ↵
    1. Richardson DK,
    2. Tarnow-Mordi WO
    (1994) Measuring illness severity in newborn intensive care. J Intensive Care Med 9:20–33.
    OpenUrlAbstract/FREE Full Text
  22. ↵
    1. Kaaresen PI,
    2. Dohlen G,
    3. Fundingsrud HP,
    4. Dahl LB
    (1998) The use of CRIB (clinical risk index for babies) score in auditing the performance of one neonatal intensive care unit. Acta Paediatr 87:195–200.
    OpenUrlCrossRefPubMed
  23. ↵
    1. Petridou E,
    2. Richardson DK,
    3. Dessypris N,
    4. et al.
    (1998) Outcome prediction in Greek neonatal intensive care units using a score for neonatal acute physiology. Pediatrics 101:1037–1044.
    OpenUrlAbstract/FREE Full Text
  24. ↵
    1. Baumer JH,
    2. Wright D
    (1997) Illness severity measured by CRIB score: a product of changes in perinatal care. Arch Dis Child 77:F211–F215.
    OpenUrl
  25. ↵
    1. Liechty EA,
    2. Donovan E,
    3. Purohit D,
    4. et al.
    (1991) Reduction of neonatal mortality after multiple doses of bovine surfactant in low birth weight neonates with respiratory distress syndrome. Pediatrics 88:19–28.
    OpenUrlAbstract/FREE Full Text
  26. ↵
    1. Doyle LW,
    2. Kitchen WH,
    3. Ford GW,
    4. et al.
    (1986) Effects of antenatal steroid therapy on mortality and morbidity in very low birth weight infants. J Pediatr 108:287–292.
    OpenUrlCrossRefPubMed
  27. ↵
    1. Guyer B,
    2. Martin JA,
    3. MacDorman MF,
    4. Anderson RN,
    5. Strobino DM
    (1997) Annual summary of vital statistics—1996. Pediatrics 100:905–918.
    OpenUrlAbstract/FREE Full Text
  28. ↵
    1. Harrell FE,
    2. Lee LK,
    3. Mark DB
    (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15:361–387.
    OpenUrlCrossRefPubMed
  • Copyright © 2000 American Academy of Pediatrics
PreviousNext
Back to top

Advertising Disclaimer »

In this issue

Pediatrics
Vol. 105, Issue 5
1 May 2000
  • Table of Contents
  • Index by author
View this article with LENS
PreviousNext
Email Article

Thank you for your interest in spreading the word on American Academy of Pediatrics.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
A Comparison of Neonatal Mortality Risk Prediction Models in Very Low Birth Weight Infants
(Your Name) has sent you a message from American Academy of Pediatrics
(Your Name) thought you would like to see the American Academy of Pediatrics web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Request Permissions
Article Alerts
Log in
You will be redirected to aap.org to login or to create your account.
Or Sign In to Email Alerts with your Email Address
Citation Tools
A Comparison of Neonatal Mortality Risk Prediction Models in Very Low Birth Weight Infants
Murray M. Pollack, Matthew A. Koch, Doris A. Bartel, Irina Rapoport, Ramasubbareddy Dhanireddy, Ayman A. E. El-Mohandes, Kenneth Harkavy, K. N. Siva Subramanian, the District of Columbia Neonatal Network
Pediatrics May 2000, 105 (5) 1051-1057; DOI: 10.1542/peds.105.5.1051

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
A Comparison of Neonatal Mortality Risk Prediction Models in Very Low Birth Weight Infants
Murray M. Pollack, Matthew A. Koch, Doris A. Bartel, Irina Rapoport, Ramasubbareddy Dhanireddy, Ayman A. E. El-Mohandes, Kenneth Harkavy, K. N. Siva Subramanian, the District of Columbia Neonatal Network
Pediatrics May 2000, 105 (5) 1051-1057; DOI: 10.1542/peds.105.5.1051
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Print
Download PDF
Insight Alerts
  • Table of Contents

Jump to section

  • Article
    • Abstract
    • METHODS
    • RESULTS
    • DISCUSSION
    • ACKNOWLEDGMENTS
    • Footnotes
    • REFERENCES
  • Figures & Data
  • Info & Metrics
  • Comments

Related Articles

  • No related articles found.
  • PubMed
  • Google Scholar

Cited By...

  • Interinstitutional Variation in Prediction of Death by SNAP-II and SNAPPE-II Among Extremely Preterm Infants
  • Prenatal predictors of mortality in very preterm infants cared for in the Australian and New Zealand Neonatal Network
  • Moderately premature infants at Kaiser Permanente Medical Care Program in California are discharged home earlier than their peers in Massachusetts and the United Kingdom
  • Prediction of Death for Extremely Low Birth Weight Neonates
  • Neonatal disease severity scoring systems
  • Does the Experience With the Use of Nasal Continuous Positive Airway Pressure Improve Over Time in Extremely Low Birth Weight Infants?
  • Assessing mortality risk in very low birthweight infants: a comparison of CRIB, CRIB-II, and SNAPPE-II
  • Marginal Increase in Cost and Excess Length of Stay Associated With Nosocomial Bloodstream Infections in Surviving Very Low Birth Weight Infants
  • Diagnosis of Neonatal Sepsis: A Clinical and Laboratory Challenge
  • Risk Adjustment for Pediatric Quality Indicators
  • Outcomes for high risk New Zealand newborn infants in 1998-1999: a population based, national study
  • C-Reactive Protein, Interleukin-6, and Procalcitonin in the Immediate Postnatal Period: Influence of Illness Severity, Risk Status, Antenatal and Perinatal Complications, and Infection
  • Perinatal risk factors for major intraventricular haemorrhage in the Australian and New Zealand Neonatal Network, 1995-97
  • Google Scholar

More in this TOC Section

  • Uncertainty at the Limits of Viability: A Qualitative Study of Antenatal Consultations
  • Evaluation of an Emergency Department High-risk Bruising Screening Protocol
  • Time to First Onset of Chest Binding–Related Symptoms in Transgender Youth
Show more Article

Similar Articles

Subjects

  • Fetus/Newborn Infant
    • Fetus/Newborn Infant
  • Journal Info
  • Editorial Board
  • Editorial Policies
  • Overview
  • Licensing Information
  • Authors/Reviewers
  • Author Guidelines
  • Submit My Manuscript
  • Open Access
  • Reviewer Guidelines
  • Librarians
  • Institutional Subscriptions
  • Usage Stats
  • Support
  • Contact Us
  • Subscribe
  • Resources
  • Media Kit
  • About
  • International Access
  • Terms of Use
  • Privacy Statement
  • FAQ
  • AAP.org
  • shopAAP
  • Follow American Academy of Pediatrics on Instagram
  • Visit American Academy of Pediatrics on Facebook
  • Follow American Academy of Pediatrics on Twitter
  • Follow American Academy of Pediatrics on Youtube
  • RSS
American Academy of Pediatrics

© 2021 American Academy of Pediatrics