Risk-Adjusted Hospital Outcomes for Children’s Surgery
BACKGROUND The American College of Surgeons National Surgical Quality Improvement Program-Pediatric was initiated in 2008 to drive quality improvement in children’s surgery. Low mortality and morbidity in previous analyses limited differentiation of hospital performance.
METHODS: Participating institutions included children’s units within general hospitals and free-standing children’s hospitals. Cases selected by Current Procedural Terminology codes encompassed procedures within pediatric general, otolaryngologic, orthopedic, urologic, plastic, neurologic, thoracic, and gynecologic surgery. Trained personnel abstracted demographic, surgical profile, preoperative, intraoperative, and postoperative variables. Incorporating procedure-specific risk, hierarchical models for 30-day mortality and morbidities were developed with significant predictors identified by stepwise logistic regression. Reliability was estimated to assess the balance of information versus error within models.
RESULTS: In 2011, 46 281 patients from 43 hospitals were accrued; 1467 codes were aggregated into 226 groupings. Overall mortality was 0.3%, composite morbidity 5.8%, and surgical site infection (SSI) 1.8%. Hierarchical models revealed outlier hospitals with above or below expected performance for composite morbidity in the entire cohort, pediatric abdominal subgroup, and spine subgroup; SSI in the entire cohort and pediatric abdominal subgroup; and urinary tract infection in the entire cohort. Based on reliability estimates, mortality discriminates performance poorly due to very low event rate; however, reliable model construction for composite morbidity and SSI that differentiate institutions is feasible.
CONCLUSIONS: The National Surgical Quality Improvement Program-Pediatric expansion has yielded risk-adjusted models to differentiate hospital performance in composite and specific morbidities. However, mortality has low utility as a children’s surgery performance indicator. Programmatic improvements have resulted in actionable data.
- ACS —
- American College of Surgeons
- ASA —
- American Society of Anesthesiologists
- CI —
- confidence interval
- CPT —
- Current Procedural Terminology
- NSQIP —
- National Surgical Quality Improvement Program
- OR —
- odds ratio
- SSI —
- surgical site infection
- UTI —
- urinary tract infection
What’s Known on This Subject:
The American College of Surgeons National Surgical Quality Improvement Program-Pediatric has examined 30-day risk-adjusted outcomes in children’s surgery. Because of low event rates, initial efforts yielded valid models that did not meaningfully discriminate outcomes among over 20 participating institutions.
What This Study Adds:
Programmatic growth, sampling algorithm refinement, and hierarchical modeling use have resulted in the ability to reliably discriminate performance among hospitals in multiple domains. We report the first actionable peer-reviewed risk-adjusted, multiinstitutional outcome data in children’s surgery.
Improving quality of care in children’s surgery requires a robust system for measuring surgical outcomes.1 Surgical care involves complex processes, significant costs, and substantial risks. Until recently, methods to assess processes and outcomes and reliably compare performance among hospitals specifically for surgery in childhood have been lacking.1,2
The surgical care of children poses unique challenges to optimizing outcomes and safety. Within adult surgical care, the American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) has successfully developed risk-adjusted methods to compare hospital performance and perform quality improvement at the institutional level.3,4 Analysis of hospital practices has enabled the ACS NSQIP to develop best practice recommendations for multiple adult surgical issues.5,6 However, the diseases, therapeutic options, and relevant surgical outcomes in children differ markedly from those of adults.1,7 Particular challenges associated with assessing children’s surgery outcomes include the following: (1) markedly lower complication and mortality rates, (2) comorbid conditions that occur with less frequency but are more complex to group and categorize, and (3) markedly lower surgical volumes such that available sample sizes at single institutions can be an order of magnitude below those for adults. Consequently, the NSQIP-Pediatric was initiated by the ACS as a pilot program with 4 hospitals in 2008,8,9 and subsequently opened for voluntary subscription by hospitals that treat children.10 This program has incorporated a wide range of surgical specialties and pediatric-specific comorbidities, procedures, and outcomes. In contrast to claims databases that rely on billing or administrative codes with potentially inconsistent clinical meaning,11–13 the NSQIP-Pediatric utilizes trained personnel to prospectively collect comprehensive clinical data from the medical record. After establishing feasibility,8,9 the NSQIP-Pediatric rapidly expanded to include more than 40 hospitals. Data from 2010 yielded the first risk-adjusted assessment of hospital performance in postoperative mortality and morbidity for children’s surgeries.10
We now report the most extensive analyses to date of ACS NSQIP-Pediatric data regarding 30-day postoperative outcomes, and the first analyses reporting specialty-specific outcomes. Programmatic and analytic improvements include addition of more participating hospitals, refinement of pediatric-specific variable definitions, expansion of predictor variables, more comprehensive adjustment of morbidity outcomes for preexisting conditions, and successful hierarchical modeling of multiple outcomes with incorporation of procedure-associated risk.
Hospital participation was voluntary by subscription, with each site contributing up to 1600 cases per year. Children’s units within adult hospitals, as well as free-standing children’s hospitals, were eligible. Cases were selected on an 8-day cycle by Current Procedural Terminology (CPT) codes specified in the program.8 The number of included cases for some common procedures such as gastrostomy and appendectomy were limited to promote procedure diversity. Included cases were performed during 2011 in children aged <18 years. Specialties represented in the program include pediatric general and thoracic, otolaryngologic, orthopedic, urologic, plastic, neurologic, and gynecologic surgery. Trauma, transplantation, and cardiac surgical procedures were excluded because these are captured in alternate specialty-specific databases. Subspecialties were assigned where possible based on the typical specialty associated with the procedure (eg, tympanoplasty associated with otolaryngology), or else into a category spanning specialties (spine, hand). Abstractors who received standardized training and supervision to ensure uniformity in data collection collected data by chart review at each hospital.14 When necessary, data abstractors contacted hospital personnel for clarification of clinical details regarding cases. A designated surgeon champion at each institution provided local program oversight. Data were submitted electronically to the ACS for processing and analysis. Changes to case selection and variable definitions were implemented when necessary at 6-month intervals to balance data stability with goal of continual programmatic improvement. For example, to ensure analysis of a sufficient quantity of complications, several high-volume procedures such as tonsillectomy and inguinal hernia repair were identified in prior analyses as having extremely low risk of mortality and morbidity and were excluded starting in July 2011.
The roughly 1500 different CPT codes entered into the program during this time period were collapsed into ∼225 CPT groupings by aggregating closely related CPT codes based on a combination of anatomic and surgical relatedness. The general approach to performing this grouping has been previously described.9
A neonate was defined as a child ≤28 days old at the time of procedure for term births (gestational age ≥37 weeks), or a child <50 full “postconception” weeks at operation for preterm births. Patients who did not meet these criteria were designated as “pediatric,” meaning nonneonatal.
Data from the following categories were collected prospectively: demographic, surgical profile, clinical preoperative, laboratory, intraoperative, and postoperative. Specific variables have been previously published.1 Outcomes of interest were evaluated during the first 30 days after surgery, as is characteristic of the ACS NSQIP, and included mortality and several postoperative morbidities (Table 1). Definitions for infectious complications, such as central-line associated bloodstream infection, pneumonia, urinary tract infection (UTI), and surgical site infection (SSI), were based on the Centers for Disease Control and Prevention criteria. Bleeding event, a new outcome, was defined as blood transfusion volume >25 mL/kg (patient weight) within 72 hours after procedure start time. Morbidity was modeled as a composite outcome (any NSQIP-Pediatric defined morbidity) and by individual morbidity when an adequate event rate and absolute number of events occurred.
Descriptive statistics were calculated for patient characteristics, volume by CPT group, and outcomes. In addition to analysis of the entire cohort, pediatric abdominal surgery, neonatal abdominal surgery, and individual surgical specialty subgroups were evaluated independently.
Due to low occurrence individually, the following pairs of predictor variables were consolidated based on shared risk or disease category in modeling: cerebral-vascular accident and coma, chemotherapy and radiotherapy, and bone marrow and solid organ transplant. Renal failure and dialysis were consolidated only when individual incidence was too low for analysis. Five necessary predictor variables had some degree of missing data: race, gender, American Society of Anesthesiologists (ASA) class, case type (elective, urgent, emergent), and transfer status. Values for these missing data were imputed based on 50 available predictors by using the method of Buck.15 For patient weight, age-specific extreme outliers were excluded (Appendix Table 1), and median weight for age was replaced for missing and excluded values.
Procedure-specific risk was assigned to each CPT group for each outcome by preliminary modeling of CPT group as a categorical predictor and linear-transformation of the derived odds. This procedure and outcome-specific, linear predicted probability was used as a continuous predictor variable in subsequent outcome models. Forward stepwise logistic regression was performed by using 50 available variables to develop a predictive model for each outcome.9 The variables selected by logistic regression analysis were used in a hierarchical model to derive risk-adjusted odds ratios (ORs) with accompanying 95% confidence intervals (95% CIs) by individual hospital for specific outcomes.
In each model, hospitals were assigned outlier status based on the 95% CI about the OR. OR of 1 indicated that the hospital outcome incidence, when adjusted for patient and procedure risks, approached the average for all hospitals. A hospital was designated as “needs improvement” or high outlier if the 95% CI about the OR was entirely above 1.0, “exemplary” or low outlier if it was entirely below 1.0, and “as expected” if it spanned 1.0.
All analyses were performed by using SAS 9.3 (SAS Institute, Inc, Cary, NC).
The concept of “reliability” was used to estimate the relative degree of “signal” versus “noise” in the comparative outcome assessment by individual hospital, as previously performed by others.16 Estimation of the minimum sample size necessary for a hospital to achieve a specified reliability (for future modeling periods) required 3 components: (1) the variability in outcomes across hospitals, (2) the average risk of the outcome across patients within the entire cohort, and (3) the risk-adjusted event rate for the hospital. To mitigate the sampling error associated with using only a specific set of hospitals, the SD of the risk-adjusted hospital event rates in the original model was calculated and then added to or subtracted from the sample mean, and new estimates for the risk-adjusted hospital event rate were obtained to perform sensitivity analysis. This sensitivity analysis provided an interval of “plausible values” for the minimum required size.
Human Subjects Protection
The ACS NSQIP-Pediatric was conducted at each site either with institutional review board approval or as an operational quality improvement program exempt from such review. These analyses were performed on preexisting and deidentified data.
In 2011, 46 281 patients were prospectively accrued into the program. There were 30 participating hospitals in January and 43 by December from 28 states and 1 Canadian province (current list available at http://site.acsnsqip.org/participants/). The median number of entered cases per hospital was 1164 (mean 1076, SD 365; range 293 [partial year] to 1541). For this period, 1467 different CPT codes were collapsed into 226 CPT groupings. The top 25 groupings (Table 2) constituted 71.6% of all cases. Patient characteristics are shown in Table 3. Table 4 lists the occurrence of mortality, composite morbidity, and SSI in the overall cohort, and by age and surgical specialty groups. Median institutional rate of complete 30-day follow-up was 92.4% (mean 92.2%, SD 5.7%).
Results of the hierarchical models are summarized in Table 5. C-index is provided as estimate of model discrimination.17 Statistically significant predictor variables and ORs for selected models are listed in the Appendix.
Hierarchical models were successfully constructed for mortality in the entire cohort (Fig 1A) and for the neonatal abdominal surgery groups, although no hospitals were outliers. Low mortality rate prevented hierarchical model construction for all other subgroups.
For the entire cohort, the composite morbidity model demonstrated 10 low and 10 high outlier hospitals (Fig 1B). Composite morbidity hierarchical models were successful for all specialties except thoracic and gynecologic surgery, due to insufficient case volumes and event rates (Table 5). Spine surgery was the only subspecialty to demonstrate both low (n = 3) and high (n = 5) outliers (Fig 1C).
Surgical Site Infection
As indicated in Table 5, the hierarchical SSI model for the entire cohort demonstrated 3 low and 7 high outlier hospitals (Fig 1D). High SSI outliers were also demonstrated in the general surgery and pediatric abdominal surgery groups. The hierarchical SSI model failed in the neonatal abdominal surgery group. Where SSI was successfully modeled in the remaining surgical specialties, no outliers were identified.
Individual Morbidity Models
Hierarchical modeling of the following outcomes in the entire cohort demonstrated outlier hospitals: UTI (c-index = 0.90; 1 low, 4 high outliers) and reintubation (c-index = 0.91; 1 high outlier). The following outcomes were also successfully modeled for the entire cohort, but had no outliers: pneumonia (c-index = 0.89), cardiac arrest requiring cardiopulmonary resuscitation (c-index = 0.91), deep vein thrombosis or pulmonary embolus (c-index = 0.96), and renal complication (c-index = 0.95). In the pediatric abdominal surgery cohort, hierarchical models were successfully constructed for UTI (c-index = 0.86), reintubation (c-index = 0.90), and pneumonia (c-index = 0.82), all of which had no outliers. Similarly, no hospitals were designated as outliers in the successful hierarchical model for reintubation (c-index = 0.82) in the neonatal abdominal surgery cohort.
Table 6 shows the estimated sample sizes required to reach a reliability of 0.4 and 0.7 in modeling mortality, composite morbidity, and SSI in the entire cohort, as well as in the pediatric abdominal and neonatal abdominal cohorts. Case volume for hospitals participating in the NSQIP for the entire data period is listed for comparison.
These results represent a milestone in the drive toward quality improvement within children’s surgery. We report successful risk-adjusted hierarchical modeling to reliably evaluate performance of individual hospitals that provide children’s surgical care. Risk-adjustment enables comparison of surgical outcomes at hospitals with differing patient acuity and procedure complexity. For the first time, outcome models were generated by specific specialties reflecting the breadth of children’s surgery.
The assessment of relative hospital performance for risk-adjusted outcomes is critical to improvement of children’s surgical care. NSQIP participation has been shown to result in continuous quality improvement; in the more mature adult NSQIP program, Hall et al18 reported a steady and continuous program-wide decline in surgical mortality and morbidity, which likely resulted in thousands of lives saved and complications prevented. Importantly, we have found that 30-day mortality, a broadly accepted indicator of hospital quality in adult surgery, does not effectively differentiate performance in children’s surgery. Rather, stakeholders seeking to evaluate and improve children’s surgical outcomes will need to focus on composite morbidity and specific complications.
These outcome models provide valuable benchmarks for event rates in children’s surgery nationally. For individual hospitals, performance below the expected level highlights specific targets for institutional efforts to improve outcomes3,19 and reduce health care costs20 in a cost-effective manner.21 Dissemination of practices from hospitals with exemplary performance can contribute to improvement at other hospitals; identification of these best practice elements is essential for evidence-based guideline development.5,6 Data on performance above or below expected, even in the absence of statistical outlier status can lead to successful quality improvement efforts.3 Specialty-specific analysis, available for the first time in these models, provides institutions with a more robust set of actionable data. Since individual complication risk and consequences vary across all types of children’s surgery, the relevance of improvement targets differs among specific surgical disciplines.
Since the initiation of the NSQIP-Pediatric in 2008, noteworthy improvements have occurred, such as increased cohort size and rapid expansion of hospital participation, targeted sampling of cases that have inherently higher risk of complications, incorporation of additional model predictors, and better procedure mix control via consideration of baseline risk attributable to CPT code groupings. The α phase of the NSQIP-Pediatric included 4 hospitals and 7287 patients over 14 months in 2008–2009,9 whereas the β phase included 29 hospitals and 37 157 patients over 1 year in 2010.10 Increased numbers and improved selection of cases enabled consideration of greater than twice as many predictors in the current analysis compared with 201010 and generation of a wider complement of hierarchical models for mortality, composite morbidity, and specific complications by age and 9 specialty subgroups. Hierarchical analysis, which incorporates inherent clustering of cases (eg, by hospital), is considered standard for performance evaluation by organizations such as Centers for Medicare and Medicaid Services.22 We have shown that hierarchical analysis of children’s surgery is feasible.
This work provides unique insights regarding reliability of children’s surgery assessments. The statistical concept of reliability relates the error of measurement associated with an individual institution’s performance to the observed differences in performance between institutions.16 As reliability approaches 1, the assessment is dominated by observed differences between institutions (signal), and the magnitude of error associated with measuring each institution (noise) is comparatively small. However, no clear consensus exists regarding acceptable reliability thresholds. Some have proposed use of a scale similar to interobserver agreement, where 0.4 is termed “fair” or “acceptable” and 0.7 is “good” or “strong”; others have proposed that 0.7 be used as a minimum in health care evaluations. Observed baseline event rates and the reliability calculations in Table 6 have provided valuable information to guide the future direction of the program. For mortality, the large sample number required for high reliability reflects the low overall event rate and lack of variation among hospitals and is not feasible even in an expanding program such as the NSQIP-Pediatric. In contrast, composite morbidity will likely differentiate institutions, but the relevance of specific morbidities to children’s surgical specialties and specific procedures requires refinement. The program has not matured to the point of generating outcome data specific to individual procedures but recognizes the demand for this information going forward. The reliability calculations estimate case volumes and event rates necessary to achieve policy goals around performance evaluation.
The data in this report are subject to several limitations. The NSQIP-Pediatric was initially based on an adult surgery platform. Although variable definitions are modified periodically to improve appropriateness for children, the process is iterative and necessary to ensure consistent data collection across institutions. Surgical Clinical Reviewers who collect data at individual institutions are rigorously trained. A process for validation of data collection consistency within and among sites exists14 and is constantly being adapted to meet program needs. Collected data largely depend on medical record documentation that can vary in level of detail. In particular, achieving complete 30-day postoperative outcome assessment is challenging because telephone or written contact with families is sometimes required depending on the timing of postoperative assessments. Finally, participant sites have expressed concern that outcomes reported represent patient care that occurred 6 to 18 months before the report. The adult NSQIP faces similar challenges and demands for data that reflect “real-time” care, and recent advancements include online modeling of current institutional data based on risk-adjustment algorithms from the previous reporting period.
Current developments in the ACS NSQIP-Pediatric include expanding the profile of participating hospitals and developing procedure-specific modules that include variables relevant to individual procedures. Inclusion of hospitals with diverse patient and procedure profiles, including lower procedure volumes, will help fulfill the program’s mission to improve children’s surgical care in all hospital types. Variables and outcomes specific to certain procedures, such as appendectomy and tracheostomy, are currently being collected to provide insight into specialty and procedure-specific issues. For example, a current appendectomy pilot focuses on variation in care and resource utilization in addition to outcomes. This will enable analysis of value in addition to quality. Each module requires time to establish goals, develop relevant variables and definitions, and accrue and analyze data. Ultimately, these NSQIP-Pediatric modules will facilitate the generation of best practice guidelines to reduce variability in care quality and resource utilization.
This work demonstrates that risk-adjusted children’s surgical outcomes can be accurately determined and used to evaluate performance among institutions. This was the first period in which hierarchical modeling could be successfully applied to both aggregated outcomes and to 9 surgical specialties, and these results represent a significant advancement in assessing outcomes in children’s surgery at both the hospital and surgical specialty levels. Although valuable in assessing adult surgical quality, mortality discriminates hospital performance poorly in children’s surgery. Analysis of ACS NSQIP-Pediatric outcomes serves as a basis for ongoing efforts to optimize the surgical care of children and demonstrates a new ability for surgeons caring for children to develop quality assessment frameworks for public and private stakeholders to consider for national implementation.
- Accepted June 18, 2013.
- Address correspondence to R. Lawrence Moss, MD, Surgeon-in-Chief, Nationwide Children’s Hospital, E. Thomas Boles Jr, Professor of Surgery, The Ohio State University, College of Medicine, 700 Children’s Dr, Columbus, OH 43205. E-mail:
Dr Saito contributed to the study design, data acquisition, and interpretation, and drafted and critically revised the article; Dr Chen contributed to the data interpretation and drafted and critically revised the article; Dr Hall contributed to the study conception and design, data acquisition, analysis, and interpretation, and drafted and critically revised the article; Dr Kraemer contributed to the data analysis and critically revised the article; Dr Barnhart contributed to the study conception and design, data analysis and interpretation, and critically revised the article; Ms Byrd, Dr Cohen, Mr Huffman, Dr Ko, Ms Latus, Ms Richards, and Ms Sutton contributed to the data acquisition, analysis, and interpretation and revised the article; Dr Fei contributed to the data analysis and interpretation and revised the article; Dr Heiss contributed to the study design, data acquisition and interpretation, and critically revised the article; Dr Meara contributed to the data analysis and interpretation and critically revised the article; Dr Oldham contributed to the study conception and design, data acquisition, analysis, and interpretation, and critically revised the article; Dr Raval contributed to the study conception, data acquisition and interpretation, and critically revised the article; Dr Shah contributed to the data interpretation and critically revised the article; Dr Vinocur contributed to the study conception and design, data analysis, and critically revised the article; Dr Moss contributed to the study conception and design, data acquisition, analysis, and interpretation, and critically revised the article; and all authors approved the final manuscript as submitted.
FINANCIAL DISCLOSURE: Dr Hall is a paid consultant for the American College of Surgeons; the other authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: No external funding.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
- Young PL,
- Olsen LA,
- McGinnis JM
- Campbell DA Jr,
- Henderson WG,
- Englesbe MJ,
- et al
- Chow WB,
- Rosenthal RA,
- Merkow RP,
- Ko CY,
- Esnaola NF,
- American College of Surgeons National Surgical Quality Improvement Program,
- American Geriatrics Society
- Raval MV,
- Dillon PW,
- Bruny JL,
- et al.,
- ACS NSQIP Pediatric Steering Committee
- Cima RR,
- Lackore KA,
- Nehring SA,
- et al
- Buck SF
- Tripepi G,
- Jager KJ,
- Dekker FW,
- Zoccali C
- Hall BL,
- Hamilton BH,
- Richards K,
- Bilimoria KY,
- Cohen ME,
- Ko CY
- ↵Ash AS, Fienberg SE, Louis TA, Normand SLT, Stukel TA, Utts J. Statistical issues in assessing hospital performance. 2012. Available at: www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/HospitalQualityInits/Downloads/Statistical-Issues-in-Assessing-Hospital-Performance.pdf. Accessed March 7, 2013
- Copyright © 2013 by the American Academy of Pediatrics