BACKGROUND: Regionalized care delivery purportedly optimizes care to vulnerable very low birth weight (VLBW; <1500 g) infants. However, a comprehensive assessment of quality of care delivery across different levels of NICUs has not been done.
METHODS: We conducted a cross-sectional analysis of 21 051 VLBW infants in 134 California NICUs. NICUs designated their level of care according to 2012 American Academy of Pediatrics guidelines. We assessed quality of care delivery via the Baby-MONITOR, a composite indicator, which combines 9 risk-adjusted measures of quality. Baby-MONITOR scores are measured as observed minus expected performance, expressed in standard units with a mean of 0 and an SD of 1.
RESULTS: Wide variation in Baby-MONITOR scores exists across California (mean [SD] 0.18 (1.14), range –2.26 to 3.39). However, level of care was not associated with overall quality scores. Subcomponent analysis revealed trends for higher performance of Level IV NICUs on several process measures, including antenatal steroids and any human milk feeding at discharge, but lower scores for several outcomes including any health care associated infection, pneumothorax, and growth velocity. No other health system or organizational factors including hospital ownership, neonatologist coverage, urban or rural location, and hospital teaching status, were significantly associated with Baby-MONITOR scores.
CONCLUSIONS: The comprehensive assessment of the effect of level of care on quality reveals differential opportunities for improvement and allows monitoring of efforts to ensure that fragile VLBW infants receive care in appropriate facilities.
- AAP —
- American Academy of Pediatrics
- CPQCC —
- California Perinatal Quality Care Collaborative
- VLBW —
- very low birth weight
What’s Known on This Subject:
Regionalized NICU care delivery and birth at perinatal centers minimizes mortality in very low birth weight infants. There is a lack of a more comprehensive assessment of quality and outcomes of care across different levels of care.
What This Study Adds:
Using the Baby-MONITOR, we found wide differences in quality of care provided to very low birth weight infants across NICUs. Level of care was not associated with Baby-MONITOR scores, but subcomponents highlighted opportunities for improvement at all levels.
Delivery of neonatal intensive care in regionalized systems has long been regarded as critical to providing high-quality health care to vulnerable very low birth weight (VLBW; <1500 g) infants. However, over the past decades the regionalized care systems for sick newborns may have been weakened by financial rewards under fee-for-service arrangements and demand from community hospitals and families seeking to deliver close to home.1
Lower mortality of VLBW infants has been observed after birth in a perinatal center.2 Phibbs and colleagues showed higher mortality in lower-level and lower-volume NICUs.3,4 A meta-analysis indicated 62% higher odds of mortality during the birth hospitalization with birth outside a high-level NICU.5 NICU volume may be an even more important predictor of mortality.6,7
These studies imply that quality of care delivery for vulnerable VLBW infants at lower-level NICUs may be suboptimal. However, mortality as a sole measure of quality is limited. In isolation, it provides little information about the care provided to the 85% of infants8 who survive to discharge.9 Yet a comprehensive assessment of care and outcomes across different levels of NICUs does not exist.
Neonatal intensive care is a complex and multidimensional activity, and the measurement of its quality should reflect this fact. Although individual measures contain important information, there is also value in summarizing performance by combining the information from multiple measures because such a summary can convey quality from many different perspectives.10 In previous work, we created a composite indicator, the Baby-MONITOR, as a comprehensive measure of the care and outcomes for VLBW infants.11,12 In this article, we used the Baby-MONITOR and its individual components to examine whether care and outcomes differ between different NICU levels.
We conducted a cross-sectional population-based analysis of clinical data obtained from the California Perinatal Quality Care Collaborative (CPQCC).13 More than 90% of California NICUs are members of the CPQCC. Data for this study are derived from the CPQCC clinical data sets, which include several quality assurance mechanisms. Annual training sessions for local NICU personnel help to promote accuracy and uniformity in data abstraction. In addition, each record has range and logic checks, both at the time of data collection and data closeout, with auditing of records with excessive missing data.
The sample included live-born infants with a birth weight of 401 to 1500 g or a gestational age between 25 0/7 and 31 6/7 weeks. We used multiyear analyses (January 1, 2008, to December 31, 2012) because of the small number of VLBW infants cared for in some institutions.
We used previously published selection criteria aimed at creating a relatively homogenous sample of VLBW infants.11 To ensure that patient outcomes reflected NICU quality of care, we excluded infants who died before 12 hours of life and those with severe congenital anomalies (see Supplemental Information). We also excluded infants born before 25 weeks of gestation to minimize bias at the threshold of viability.14
Data for individual infants are linked such that they can be followed if transferred between CPQCC NICUs. Because patient transfers may bias NICU performance assessments, we developed detailed algorithms to avoid unduly crediting or penalizing NICUs for care delivered elsewhere. Guiding principles for these algorithms were as follows:
Only infants with at most 3 admission records from 2 hospitals are included.
If the birth hospital transferred an infant by 3 days of age (day 1 being the day of birth), subsequent relevant outcomes (eg, chronic lung disease) accrue to the receiving hospital (counted as missing for the birth hospital).
If the birth hospital transferred an infant after 3 days of age, subsequent relevant outcomes accrue to the birth hospital (counted as missing for the receiving hospital).
See also Supplemental Table 3.
Baby-MONITOR: measures for the composite scale were selected by an expert panel15 and affirmed by practicing neonatologists.16 Measure definitions used standard CPQCC algorithms. The measures were expressed as binary variables at the patient level and as proportions at the unit level. They include (1) any antenatal steroid administration; (2) moderate hypothermia (<36°C) on admission; (3) non–surgically induced pneumothorax; (4) hospital-acquired bacterial or fungal infection; (5) oxygen requirement at 36 weeks’ gestational age; (6) retinopathy of prematurity screening at the age recommended by the American Academy of Pediatrics (AAP); (7) discharge on any human milk; (8) mortality during the birth hospitalization; and (9) growth velocity (less or more than the median of 12.9 g/kg/day) calculated by using a logarithmic function.17
Variable of Interest: Level of Care
NICU level of care was derived as a self-reported variable derived from the 2012 Vermont Oxford Network Survey of NICU directors. Designations follow the 2012 definitions set forth by the AAP.18 This study included Level (L) II through IV NICUs.
Missing AAP levels and discrepancies were checked and confirmed with the NICUs. Four centers only provided the older AAP levels (eg, IIA, IIB), in which case we determined the new AAP level based on the ventilation duration, the number of cardiac surgeries, and care levels as designated by the California Children’s Services.19
Organizational variables: hospital ownership (government, not-for-profit, for-profit, other) and neonatologist coverage (in-house or at home) were obtained from the 2012 Vermont Oxford Network Annual Survey of NICU directors. Hospital volume was obtained from the eligible infants from the study cohort in the CPQCC data. Hospital teaching status was derived from the Regional Perinatal Programs of California.20
Clinical variables: these data were obtained from the CPQCC data set and included prenatal care, gender, weight for gestational age below the 10th percentile, outborn, multiple birth, 5-minute Apgar score, and Cesarean delivery. Gestational age at birth was categorized into 25 weeks to 27 weeks 6 days, 28 weeks to 29 weeks 6 days, and ≥30 weeks gestation groups, based on similar patient numbers among groups. Apgar score was categorized as ≤3, between 4 and 6, and >6. Prenatal care was defined as receipt of any prenatal obstetrical care before the admission during which birth occurred.
Computation of Baby-MONITOR scores requires that its subcomponents are aligned according to valence (higher score = better performance), risk adjusted, and standardized using the Draper-Gittoes method.12,21 With this method, a standardized observed minus expected z score was computed, with an expected mean of 0 and a SD of 1. Each z score was equally weighted and averaged to derive a Baby-MONITOR score for each NICU. We used bootstrapping (a simulation in which each NICU’s patients were resampled with replacement 500 times22) to compute 95% confidence intervals.
Association of Baby-MONITOR Scores With Level of NICU Care
We grouped NICUs according to their level of care and calculated Baby-MONITOR scores for each level weighted by number of infants. We used the F and t tests to assess differences in composite scores between NICU levels. To examine the effect of patient volume on quality of care delivery, we stratified the analyses according to VLBW volume using the cutoffs for high- and low-volume based on median annual volumes, achieving balance of NICUs within high- and low-volume groups (ie, L II: 1–6 = low, >6 = high; L III: 1–29 = low, >29 = high; L IV: 1–61 = low, >61 = high). These cut points are broadly consistent with those used in the literature, which had an empirical basis.5,8
Controlling for the Effects of Organizational Variables
We performed a multivariate analysis regressing Baby-MONITOR score onto NICU level, controlling for other covariates. To choose the covariates for the final model, we used backward selection with a P value criterion of <.15.
Differences in Baby-MONITOR Subcomponents by Level of Care
We used analysis of variance to test for differences in performance on risk adjusted Baby-MONITOR subcomponent scores across levels of care. We used Bonferroni adjustment to correct for multiple testing.
Human Subjects Compliance
This study was approved by the Stanford Internal Review Board.
The sample included 21 051 VLBW infants with 22 984 hospital records (transfers included) in 134 NICUs born between January 1, 2008, and December 31, 2012 who met the inclusion criteria. Of these NICUs, 25 are designated as L II, 89 as L III, and 20 as L IV.18 Approximately 4% of infants were born at L 1 hospitals, other outpatient setting, out of state, or military hospitals. Excluded from the analysis were 1194 infants (∼5%) who were transferred to ≥3 institutions. Of these, nearly 70% received cared at L IV NICUs. Table 1 shows the unadjusted population and NICU characteristics for the combined sample. Approximately 5% of infants were born at an L II NICU (1012 of 21 051). L IV NICUs cared for a higher proportion of high-risk infants. On average, infants in L IV NICUs were of lower gestational age and their mothers were more likely to be of advanced maternal age and carrying multiples. In unadjusted analyses, L II NICUs exhibited significant opportunities for process improvement. They had lower rates of antenatal steroid administration, eye examinations, and any human milk feeding at discharge, and higher rates of hypothermia on admission. On the other hand, they exhibited lower rates among several outcome measures including rates of pneumothoraces, health care associated infections, chronic lung disease, and mortality (P < .05 for all comparisons).
Baby-MONITOR Scores Across NICUs
We found significant variation in Baby-MONITOR scores across California (mean [SD] 0.18 [1.14], range –2.26 to 3.39). Figure 1 shows a caterpillar plot of the Baby-MONITOR scores with NICUs ordered with regard to ascending composite score for the clinical measures. We show both a figure based on the standard units (Fig 1A) and a conversion to percentiles (Fig 1B). The variation in performance between these NICUs was highly significant in practical terms (indicated by the 5.65 standard units of difference between the top and bottom providers). These results were robust with regard to changing the transfer cutoff days from a baseline of day 3 to scenarios including transfer on days 2 and 4 of age, as well as assigning outcomes for all transfers to the birth hospital. Finally, we included all deaths before 12 hours of age in the analysis. The correlation in Baby-MONITOR scores between these scenarios was high, ranging from 0.94 to 0.99, consistent with our previous work23 (see online Supplemental Table 4).
Level of Care and Baby-MONITOR Scores
On average, L III NICUs achieved the highest Baby-MONITOR scores (L III mean [SD (range)] 0.43 [1.35 (–2.26 to 2.64)], L IV 0.37 [1.39 (–1.61 to 3.39)], L II –0.22 [0.89 (–1.82 to 1.23]), but these differences were not statistically significant (P = .53). Stratification (Fig 2) revealed a VLBW volume effect that widened with increasing level of care (L II low 0.15 [0.5] versus high –0.3 [0.93]; L III low 0.15 [1.02] versus high 0.52 [1.43]; L IV low –0.08 [1.08] versus high 0.52 [1.45]). Neither these differences nor any of the associations of Baby-MONITOR scores with organizational variables, including hospital ownership, neonatologist coverage, and hospital teaching status, reached statistical significance (see Supplemental Information, Sensitivity Analysis).
Level of Care and Baby-MONITOR Subcomponents
Figure 3 shows significant differences across levels of care for several subcomponents with L IV NICUs scoring higher on several process measures of care, including antenatal steroids (P = .002) and any human milk at discharge (P = .092), but lower on other outcomes such as health care–associated infections (P = .040), pneumothorax (P < .001), and growth velocity (P = .006).
Table 2 also shows pairwise comparisons using L IV NICUs as a reference. Compared with L III NICUs, they had higher rates of antenatal steroids (P = .040), any human milk at discharge (P = .030), and survival (P = .045), but also of health care–associated infections (P = .014) and poor growth (P = .002). After Bonferroni adjustment, survival and human milk at discharge were no longer significant.
Compared with L II NICUs, we found higher rates of antenatal steroids (P = .036), pneumothorax (P = .012) and a trend toward higher retinopathy of prematurity examinations (P = .099). After Bonferroni adjustment, only pneumothorax remained significant.
Using population-based data, we present a multidimensional, nuanced assessment of the relation between quality of NICU care provided to VLBW infants and NICU level of care. We found significant variation in Baby-MONITOR scores across NICUs but no statistically significant association with level of care. Subcomponent analysis revealed interesting differences, with L IV NICUs performing better on process measures, as well as marginally on survival, and L II NICUs better on other outcome measures.
We consider 4 potential causes to explain our findings. First, previous literature and general advances in high-risk maternal care, including greater use of antenatal corticosteroids24 and imaging, may have fostered more appropriate utilization and regionalization patterns. Compared with previous studies,3,4 we found an inconsistent association between level of care and survival. The proportion of infants born in L II NICUs is low (5%), and case mix is favorable to survival. Thus, selection bias may have impeded our ability to demonstrate significant differences in survival of infants in L II compared with L IV NICUs. Consistent with previous research, we found a borderline survival benefit of L IV compared with L III NICUs. However, this difference was not significant after adjustment for multiple comparisons. Given differing biases of providers or parents for life-sustaining treatments, survival may not accurately reflect actual quality of care delivery. We think our results should be viewed as supporting continued national efforts to limit VLBW births in L II NICUs and for regionalized care delivery.25
Second, the current approach to defining level of care, as well as self-designation of this variable, may lead to misclassification and dilute the association with measures of quality. However, using California-specific NICU designations assigned by the state also did not result in significant associations with Baby-MONITOR scores (see Supplemental Information).
Third, L II NICUs did achieve lower scores on many process measures, indicating opportunity for quality improvement, yet they also achieved higher scores for many outcome measures. These findings might be the result of selection bias not adequately mitigated by risk adjustment. For example, growth velocity is difficult to predict using patient characteristics from the immediate peripartum time period. Future ability to extract additional data from the electronic record may help refine risk models. In addition, pseudo-randomization methods, such as an instrumental variable approach, may address some of the unobserved selection bias. This requires additional study, but previous applications of these methods to NICU outcomes have demonstrated that the benefits of care at higher-volume and/or higher-level NICUs are larger than with traditional risk-adjustment methods such as those that we used.26
Fourth, transfer bias may have depressed scores for higher-level NICUs. Outcomes for L II NICUs are measured not according to birth at such a facility but according to intent to keep such infants at a L II for treatment. However, we were careful to mitigate transfer bias by including inborn status in risk adjustment models and by assigning negative outcomes of care for infants transferred after day of age 3 to the sending NICU (outcome is missing for receiving NICU, yet positive outcome is assigned to both NICUs). We did assign negative outcomes of infants transferred before or on day of age 3 to the receiving NICU. In addition, assigning all outcomes of transferred infants back to the birth hospital also did not have significant influence on our results. Finally, there is a known inverse relation between the volume of high-risk deliveries and in-hospital fetal death rates that may be associated with the ability to perform rapid cesarean deliveries.3 This can cause a bias because fetal deaths are not included in our data and many of the cases in which the fetal death is “averted” in the high-volume hospitals will have elevated risks not captured by our data.
This study provides a good example for the usefulness of composite indicators. The composite provides a global picture of differences in quality of care and of the association with important predictors of quality. Conversely, drawing inferences on overall care based on a single measure, such as mortality, is hazardous because individual measures contain biases, making them nonrepresentative. In addition, we have previously shown that NICUs that perform well in 1 area of care may not perform well in others.
Equally important is the process of drilling down into individual subcomponents of the composite because averaging across the measures may hide important differences. This study exemplifies this by revealing important and modifiable differences between NICUs.27,28
This study must be viewed within the context of its design. Observational studies allow for the establishment of associations and the generation of hypotheses but not causal inference. In addition, as mentioned above, incomplete risk adjustment and transfer bias and confounding from unobserved variables (eg, patient-to-nurse ratios) might have affected our findings. Nevertheless, these methods have been previously published, and inclusion of institutional confounding variables may not be appropriate for quality of care comparisons. This study included nearly all of the NICUs in California, the country’s most populous state with diverse geography. Given our objective to study the effect of care organization on quality, our findings may have broad relevance to other regions in the United States and abroad. Finally, we used only a 1-time designation of NICUs in 2012 of their level of care and applied this designation to the entire study period. Because the AAP designation changed in 2012, we do not have earlier designations based on this classification scheme. However, examining changes in classification over previous years, we found them to be highly stable. Because changes in level of care usually occur toward a higher level, this limitation would bias our results toward the null.
In this population-based study, we found wide variation in overall quality of care provided to VLBW infants by using the Baby-MONITOR, but no significant associations with NICU level of care. We did, however, find important associations with its subcomponents, with L IV NICUs receiving higher-quality scores for measures of care process, and L II NICUs receiving higher scores for several care outcomes. These findings highlight opportunities for further improvements that can be addressed through targeted interventions.
We thank the CPQCC member NICUs for contributing data to this study. We also thank Aloka Patel and Rush University Medical Center for granting Dr Profit a nonexclusive license to use Rush’s exponential infant growth model for noncommercial research purposes.
- Accepted December 1, 2015.
- Address correspondence to Jochen Profit, MD, MPH, Perinatal Epidemiology and Health Outcomes Research Unit, Division of Neonatology, Department of Pediatrics, Stanford University School of Medicine, MSOB Room x115, 1265 Welch Rd, Stanford, CA 94305. E-mail:
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: Dr Profit’s contribution was supported, in part, by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (1 R01 HD083368-01) and by the Stanford Child Health Research Institute. Dr Lee’s contribution was supported, in part, by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (K23HD068400). Dr Goldstein’s effort was supported by a career development award from the National Institute of Diabetes and Digestive and Kidney Diseases (K25 DK097279). Funded by the National Institutes of Health (NIH).
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
- Cifuentes J,
- Bronstein J,
- Phibbs CS,
- Phibbs RH,
- Schmitt SK,
- Carlo WA
- ↵Composite Measure Evaluation Framework and National Voluntary Consensus Standards for Mortality and Safety–Composite Measures: A Consensus Report. Washington, DC: National Quality Forum; 2009
- Profit J,
- Kowalkowski MA,
- Zupancic JA, et al
- Patel AL,
- Engstrom JL,
- Meier PP,
- Kimura RE
- American Academy of Pediatrics Committee on Fetus and Newborn
- California Department of Healthcare Services
- ↵Regional Perinatal Programs of California: California Department of Public Health; 2014. Available at: http://www.cdph.ca.gov/programs/rppc/Pages/default.aspx
- Draper D,
- Gittoes M
- Efron BT,
- Tibshirani RJ
- ↵Freeman VA. Very Low Birth Weight Babies Delivered at Facilities for High-Risk Neonates: A Review of Title V National Performance Measure 17. Washington, DC: Maternal and Child Health Bureau, Health Resources and Services Administration; 2010. Available at: http://mchb.hrsa.gov/grants/natlperformmeasure17rpt.pdf
- Lee HC,
- Kurtin PS,
- Wight NE, et al.
- Copyright © 2016 by the American Academy of Pediatrics