OBJECTIVE: To develop a risk-adjustment method for evaluation of in-hospital mortality after noncardiac neonatal surgery regardless of gestational age.
METHODS: Infants ≤30 days old undergoing noncardiac surgical procedures were identified by using the Kids’ Inpatient Database (KID) 2000 + 2003. Neonates were included regardless of gestational age. International Classification of Disease, Ninth Revision, Clinical Modification codes were used to assign procedures to 1 of 4 previously derived risk categories. Prematurity and other clinical variables were assessed in logistic regression analysis. The final multivariable model was validated in 3 independent data sets: KID 2006, Pediatric Health Information System (PHIS) 2001–2003, and PHIS 2006–2008. The model was applied to generate standardized mortality ratios for institutions within PHIS 2006–2008.
RESULTS: Among 18 437 eligible cases in KID 2000 + 2003, 15 278 (83%) had 1 of 66 procedure codes assigned to a risk category and were eligible for analysis. In-hospital mortality for premature infants was 10.5% compared with 2.0% for full-term neonates. In addition to risk category, the clinical variables improving prediction of in-hospital death were prematurity, serious respiratory conditions, necrotizing enterocolitis, neonatal sepsis, and congenital heart disease. Area under the receiver-operator characteristic curve for the final model was 0.90. The model also showed excellent discrimination in the 3 validation data sets (0.90, 0.89, and 0.89). Within 41 institutions in PHIS, standardized mortality ratios ranged from 0.37 to 1.91.
CONCLUSIONS: This validated method provides a tool for risk adjustment of neonates undergoing noncardiac surgery to allow comparative analyses of in-hospital mortality.
- CHD —
- congenital heart disease
- CI —
- confidence interval
- ICD-9-CM —
- International Classification of Disease, Ninth Revision, Clinical Modification
- KID —
- Kids’ Inpatient Database
- NEC —
- necrotizing enterocolitis
- OR —
- odds ratio
- PHIS —
- Pediatric Health Information System
- ROC —
- receiver-operator characteristic
- SMR —
- standardized mortality ratio
What’s Known on This Subject:
Evaluation of neonatal surgical outcomes is necessary to guide improvements in the quality of care. Meaningful comparisons must adjust for factors that alter outcomes independent of the surgical procedures.
What This Study Adds:
Herein is described a method that permits risk adjustment for the broad range of noncardiac neonatal surgery, regardless of gestational age, to permit useful comparisons for quality improvement.
Risk-adjustment methods are essential to understand important variation in clinical outcomes among patients.1 These methods account for intrinsic differences in patient populations that influence risk for certain outcomes. We previously developed a risk-adjustment method for in-hospital mortality in newborns ≤30 days undergoing noncardiac surgery. However, at that time we chose to exclude premature infants to maintain a more homogeneous population to assess surgical risk. To widen the applicability of our analysis and identify additional factors that might affect surgical risk, we elected to broaden our method and incorporate all newborns regardless of their gestational age.
To develop the risk-adjustment model, data from the Kids’ Inpatient Database (KID) 2000 and KID 2003 were combined. The KID was created by the Agency for Healthcare Research and Quality as part of the Healthcare Cost and Utilization Project and is the only nationally representative, all-payer, discharge database for children. These databases allow up to 15 diagnosis and procedure codes per case and rely on the International Classification of Disease, Ninth Revision, Clinical Modification(ICD-9-CM) system for coding. They contain a 10% random sample of normal newborn births and an 80% sample of all other pediatric discharges for patients up to 20 years of age. KID 2000 contains cases from 2784 institutions (short-term, nonfederal, general, and specialty hospitals) in 27 states*; KID 2003 contains cases from 3438 institutions in 36 states.†
Data from KID 2006 were used for model validation. KID 2006 contains cases from 3739 institutions in 38 states (additional states are Arkansas and Oklahoma). Additional validation was performed by using 2 data sets derived from the Pediatric Healthcare Information System (PHIS). The PHIS is a large, detailed clinical and financial database managed by the Performance Improvement Division of the Child Health Corporation of America. The database contains information from 41 free-standing children’s hospitals in the United States and records diagnoses and procedures for each admission using ICD-9-CM codes. PHIS data from 2001–2003 and 2006–2008 were used to be contemporaneous with the KID data sets. Approval from the Children’s Hospital Boston Institutional Review Board was obtained before analysis.
Case Selection for Model Development
Patients >30 days old at the time of surgery were excluded. Other patients excluded were those undergoing cardiac surgery, endoscopic or closed procedures, catheterization, circumcision or repair of superficial lacerations. A complete list of excluded procedure types appears in Table 1. In the KID databases, cases from states that did not record age in days or number of days from admission to procedure were also excluded.
Definition of Risk Categories
The risk categories used were those derived in our initial analysis as outlined in Table 2.2 They were empirically constructed by grouping procedures (ICD-9-CM codes) with similar in-hospital mortality rates and at least 20 cases for each code into 4 distinct risk categories. Those who underwent multiple procedures were assigned to the risk category corresponding to the single highest-risk procedure. The categories were refined by inspection; in a few instances, similar procedures (eg, colostomies, ileostomies, and repairs of diaphragmatic hernias) were combined together in the same risk groups to improve face validity and interpretation. Within risk category 3, there were 3 additional codes for repair of diaphragmatic hernia that were added since our previous analysis (53.71, 53.72, and 53.75), which increased the total number of ICD-9-CM procedure codes from 63 to 66.
Inclusion of Premature Neonates
Prematurity was defined by the presence of ICD-9-CM diagnosis codes 765.0x, 765.1x, and/or 765.21–765.28. By using the previously derived procedure risk categories, in-hospital mortality rates were examined for cases with and without a code for prematurity within each category. An empirical approach using logistic regression was used to determine the most appropriate way to include premature cases in the risk-adjustment model; options included: (1) assigning premature infants to the same risk category in which they would be included if not premature, and adding a binary covariate representing prematurity into the model, and (2) assigning premature infants to a higher-risk category than would be assigned to a nonpremature child with the same surgical procedure, such as risk category plus 1. (As an example, a neonate undergoing gastrostomy without a code for prematurity would be placed in risk category 2; a premature neonate undergoing gastrostomy might instead be placed in risk category 3.) Models were first examined for patients undergoing a single surgical procedure only, then for all eligible patients placing infants with >1 surgical procedure into the risk category corresponding to the highest risk procedure.
A model incorporating premature cases was chosen based on the highest area under the receiver-operator characteristic (ROC) curve. The area under the ROC curve reflects a model’s ability to discriminate between subjects who do and do not have the outcome of interest; a value of 0.5 suggests that the model is no better at predicting outcome than a random coin toss, while a value of 1.0 means that the model predicts the outcome perfectly.
Assessment of Clinical Variables
Any serious respiratory condition (ICD-9-CM diagnosis codes 769, 770.0–770.3, and 770.8) and necrotizing enterocolitis (NEC) (777.5) were included in the previous risk-adjustment model. Additional clinical variables that were not statistically significantly associated with mortality in our previous analyses but that might become more important with the inclusion of premature infants were reexamined. These included medical conditions and diagnoses that were presurgical or present at birth, such as neonatal sepsis, peripheral vascular anomalies, late metabolic acidosis, convulsions or seizures, serious hematologic disorders, and diagnosis of congenital heart disease (CHD).
Clinical variables were introduced one at a time as binary covariates in a logistic regression model already containing procedure risk category, prematurity, serious respiratory condition, and NEC. The variable with the highest statistically significant improvement in the area under the ROC curve, as determined by the likelihood ratio test, was retained in the model; the remaining variables were then reassessed. Once the final clinical risk factors were selected, interaction terms between clinical factors and prematurity were examined. Odds ratios (ORs) and 95% confidence intervals (CIs) were estimated for the final model.
Validation of Risk-Adjustment Method
Validation was performed in 3 independent data sets: KID 2006, PHIS 2001–2003, and PHIS 2006–2008. The same inclusion and exclusion criteria used for model development were applied. Discrimination of each model was assessed using the area under the ROC curve. Calibration was evaluated using the Hosmer-Lemeshow goodness-of-fit test, and calibration plots of observed versus expected in-hospital mortality were constructed.
Application of Risk-Adjustment Method
Once validated, the risk-adjustment method was applied to evaluate in-hospital mortality across institutions included in the PHIS 2006–2008 data set. An observed mortality rate was calculated for each institution by dividing the number of cases at the center that resulted in in-hospital death by the total number of cases at the center. The final risk-adjustment model was then used to predict the probability of in-hospital death for each case in the data set based on the characteristics in the model. The expected mortality rate for an institution was calculated by summing the probabilities of death (generated from the model) for all cases performed within the institution and dividing by the total number of cases. The standardized mortality ratio (SMR) for each center was generated by dividing its observed mortality rate by the expected mortality rate; 95% CIs were calculated. An SMR of 1.0 indicates that the observed mortality rate is equal to the expected rate. If the SMR is >1.0, the observed rate is higher than expected; if it is <1.0, the observed rate is lower than expected given the institution’s case mix complexity.
From the KID 2000 + 2003 derivation data set, a total of 18 437 neonatal surgeries met inclusion criteria. Among these, 15 278 (83%) corresponded to 1 of the 66 procedure codes used for assignment to a risk category and therefore were eligible for analysis. The number of cases in each risk category with in-hospital mortality rates is shown in Table 3 for premature and full-term neonates. Within each risk category, the mortality rates are substantially higher for premature infants compared with full-term neonates.
Risk category was the first variable included in the logistic regression model predicting in-hospital mortality. Initially, premature neonates were assigned to the same risk category in which they would be included if not premature, and a binary covariate representing prematurity was added into the logistic regression model. The area under the ROC curve for the model containing risk category plus this binary covariate was 0.86. Next, premature infants were assigned to the risk category in which they would be included if not premature plus 1; the area under the ROC curve for this model was also 0.86. Because the first method is simpler and allows us to retain the 4 originally defined risk categories, we choose to model prematurity using a binary covariate.
With risk category and prematurity in the model, there were 4 other clinical variables that contributed significantly to the prediction of in-hospital mortality. NEC and any serious respiratory condition, which had both been used in our previous risk-adjustment model remained significant predictors of in-hospital mortality. Neonatal sepsis (771.8x) and diagnosis of CHD (745.0–747.49, except 746.86) were the only other clinical variables that significantly improved the ability to predict in-hospital death and were therefore included in the model. The final risk-adjustment model is shown in Table 4. The area under the ROC curve for this model is 0.90.
Three separate data sets were used for validation: 1 derived from PHIS 2001–2003 and 2 later contemporaneous data sets from KID 2006 and PHIS 2006–2008. The total number of cases in each data set assigned to a risk category are 12 993 for PHIS 2001–2003, 9839 for KID 2006, and 17 524 for PHIS 2006–2008.
The in-hospital mortality rates for each data set are depicted by risk category in Fig 1. The risk categories had nonoverlapping 95% CIs in all circumstances. The full risk-adjustment model applied to the 3 validation data sets is shown in Table 5. It is noteworthy that the areas under the ROC curves are 0.89 to 0.90 in each of the separate validation data sets. The magnitude of the effect of the 4 clinical variables was different for each data set, but NEC and serious respiratory conditions remained the strongest predictors of in-hospital mortality. Calibration plots for the derivation and validation data sets are shown in Fig 2.
Figure 3 demonstrates the standardized mortality ratios generated from this risk-adjustment model allowing comparisons of in-hospital mortality for noncardiac neonatal surgery in the 41 free-standing children’s hospitals within the PHIS 2006–2008 data set.
It is increasingly recognized that to make improvements in the quality of health care, appropriate risk-adjusted metrics are necessary for valid comparisons.
Attempts to stratify risks for individual congenital disorders such as NEC, congenital diaphagramatic hernia, or gastroschisis have been described.3–17 Efforts have also been made to identify perioperative risk factors for major complications or death after pediatric surgery.15,16 Mortality rates for pediatric surgical procedures are well known but not risk adjusted.17 Our goal was to develop a more comprehensive risk-adjustment method that could be applied to the wide range of noncardiac neonatal surgeries. A major challenge is the rarity of these events. Our approach uses an empirically derived method from very large national administrative data sets to generate and validate the risk adjustment for in-hospital mortality after neonatal surgery. Our previous analysis was restricted to full-term infants and excluded those with major structural CHD.2 Procedures were divided among 4 distinct risk categories. It proved to be a highly predictive method encompassing the most common procedures in this population. The majority of risk proved attributable to the procedure. However, 2 clinical variables that significantly increased risk were also incorporated into our model: NEC and serious respiratory conditions.
In an effort to broaden the scope of our analysis, this method was extended to include premature infants. Of the 15 278 cases within the derivation data set from KID 2000 + 2003 that could be assigned to a risk category and were therefore eligible for analysis, >25% occurred in premature infants. As such, premature infants are clearly a large and important cohort. We also no longer exclude children with major CHD, who represent ∼4% of cases. We have shown that the same 4 risk categories of selected ICD-9-CM codes again proved applicable. It is clear that the mortality rates are higher for each risk category when premature infants are compared with their full-term counterparts. However, the best predictive model with the highest area under the ROC curve was achieved by using the same procedure-based risk categories in combination with a binary covariate representing prematurity, the previously identified clinical variables of NEC and serious respiratory conditions, and the additional clinical covariates of neonatal sepsis and any CHD. It is remarkable that only minor adjustments of our original model were required to incorporate premature infants and provide a simple risk-adjustment method for virtually all newborns undergoing noncardiac surgery. It is intriguing that prematurity itself, although a significant clinical factor in most data sets, had a relatively low OR, suggesting that much of the risk associated with prematurity is incorporated in the elements encompassed by risk category and the other clinical variables.
It should be noted that the Hosmer-Lemeshow goodness-of-fit test, which is traditionally used to assess calibration for risk prediction and risk-adjustment models, suggests poor calibration in all 4 data sets. Our models fall into a situation in which the performance of the Hosmer-Lemeshow is inadequate. The test is performed by dividing cases into 10 deciles of risk and then comparing observed mortality rates within each decile to those predicted by the risk-adjustment model. A P < .05 value would suggest that there are significant differences between observed and predicted mortality rates. However, in our models, which consist of categorical predictor variables exclusively, 10 deciles of risk cannot be created. In particular, approximately half of all cases (48% for the derivation data set, 42%–53% in each of the validation data sets) are in risk category 1, are not premature, and do not have NEC, serious respiratory conditions, neonatal sepsis, or CHD. In this lowest-risk group, the model tends to overestimate risk. For example, in the derivation data set, the observed in-hospital mortality rate is 0.01%, while the mortality rate predicted by the model is 0.3%. While this is a 30-fold difference in risk, in absolute terms the difference is small: in-hospital death is very unlikely in this low-risk group. As can be seen from the calibration plots (Fig 2), overall agreement between observed and expected mortality rates is quite good (R2 for each data set ranges from 98.0% to 98.8%). Also, because this model is intended for risk adjustment across groups of patients rather than risk prediction in individual patients, calibration is less important as a measure of model performance. The area under the ROC curve demonstrates excellent discrimination in all data sets and underscores the accuracy of this model for risk adjustment.
The shortcomings of administrative data sets are well known and include the potential for coding inaccuracies. However, these data sets do permit analysis of large populations and our model proved to have excellent discrimination. By design, risk-adjustment methods use elements that place patients at risk and not discretionary aspects of treatment. We have used procedure as a surrogate for diagnosis, which is reasonable when a specific treatment course is generally applied for the specific diagnosis. When different operative approaches are used for the same problem, these approaches were placed within the same risk category to preserve validity of this model. It is important to recognize the limitations of this method as it is not appropriate to use to predict risk of an individual procedure. Furthermore, given that the model was constructed by combining multiple procedures of many different specialties, its utility is to interrogate institutions or systems of care.
Even among the 41 free-standing children’s hospitals of the PHIS database, the standardized mortality rates range almost fivefold, from 0.37 to 1.91 (Fig 3). It is this variation that could guide exploration into the elements for potential outcome improvements. We applaud the efforts of those responsible for the pediatric module of the National Surgical Quality Improvement Program sponsored by the American College of Surgeons. It is hoped that these data will provide national benchmarks for a variety of different elements that might contribute to outcome. Undoubtedly, these data will be more robust than those available through administrative data sets, but its accumulation and validation will require time and effort. It may be years before prospective data will be sufficient to guide outcome improvements for the relatively uncommon but potentially mortal procedures of neonatal surgery. Our results may facilitate consideration of the National Surgical Quality Improvement Program data in similar categories of procedures to allow analysis before larger bodies of prospective data are available. Our present tool, Risk Adjustment for Neonatal Surgery, allows the use of data that are currently available over a multiple year span with which to identify potential areas for quality improvement efforts within institutions. It proved to be quite robust, with excellent performance characteristics in multiple data sets. Quality improvement is a dynamic effort and will certainly continue to evolve as we refine our assessment tools. This platform will also provide a method with which other issues relevant to health care outcomes such as geography, size, type of institution, insurance status, or professional training could be assessed.
- Accepted May 10, 2012.
- Address correspondence to Craig Lillehei, MD, 300 Longwood Ave, Fegan 3, Boston, MA 02115. E-mail:
All authors made substantive intellectual contributions to this study, including substantial contributions to conception and design, acquisition of data, or analysis and interpretation of data; drafted the article or revised it critically for important intellectual content; and gave final approval of the version to be published.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: No external funding.
↵* The 27 states were AZ, CA, CO, CT, FL, GA, HI, IA, KS, KY, ME, MD, MA, MO, NC, NJ, NY, OR, PA, SC, TN, TX, UT, VA, WA, WI, and WV.
↵† Additional states were IL, IN, MI, MN, NE, NH, NV, OH, RI, SD, and VT; data from ME and PA were not available.
- Iezzoni LI
- Blakely ML,
- Lally KP,
- McDonald S,
- et al.,
- NEC Subcommittee of the NICHD Neonatal Research Network
- Abdullah F,
- Zhang Y,
- Camp M,
- et al
- Blakely ML,
- Tyson JE,
- Lally KP,
- et al.,
- NICHD Neonatal Research Network
- Tsao K, Allison,ND, Harting,MT, et al. Congenital diaphragmatic hernia in the preterm infant. Surgery. 2010;148(2):404--410
- ↵Weinberg A, Huang,L, Jiang,H, et al. Perioperative risk factors for major complications in pediatric surgery: a study in surgical risk assessment for children. J Am Coll Surg. 2011;212(5):768–778
- Copyright © 2012 by the American Academy of Pediatrics