BACKGROUND AND OBJECTIVES: Patients with a primary mental health condition account for nearly 10% of pediatric hospitalizations nationally, but little is known about the quality of care provided for them in hospital settings. Our objective was to develop and test medical record–based measures used to assess quality of pediatric mental health care in the emergency department (ED) and inpatient settings.
METHODS: We drafted an evidence-based set of pediatric mental health care quality measures for the ED and inpatient settings. We used the modified Delphi method to prioritize measures; 2 ED and 6 inpatient measures were operationalized and field-tested in 2 community and 3 children’s hospitals. Eligible patients were 5 to 19 years old and diagnosed with psychosis, suicidality, or substance use from January 2012 to December 2013. We used bivariate and multivariate models to examine measure performance by patient characteristics and by hospital.
RESULTS: Eight hundred and seventeen records were abstracted with primary diagnoses of suicidality (n = 446), psychosis (n = 321), and substance use (n = 50). Performance varied across measures. Among patients with suicidality, male patients (adjusted odds ratio: 0.27, P < .001) and African American patients (adjusted odds ratio: 0.31, P = .02) were less likely to have documentation of caregiver counseling on lethal means restriction. Among admitted suicidal patients, 27% had documentation of communication with an outside provider, with variation across hospitals (0%–38%; P < .001). There was low overall performance on screening for comorbid substance abuse in ED patients with psychosis (mean: 30.3).
CONCLUSIONS: These new pediatric mental health care quality measures were used to identify sex and race disparities and substantial hospital variation. These measures may be useful for assessing and improving hospital-based pediatric mental health care quality.
- CI —
- confidence interval
- COE4CCN —
- Center of Excellence on Quality of Care Measures for Children with Complex Needs
- ECG —
- ED —
- emergency department
- ICD-9-CM —
- International Classification of Diseases, Ninth Revision, Clinical Modification
- ODD —
- observed difficulty of delivery
- OR —
- odds ratio
- PABAK —
- prevalence and rater bias–adjusted κ statistics
- PMCA —
- Pediatric Medical Complexity Algorithm
- QI —
- quality improvement
What’s Known on This Subject:
Pediatric mental illness is a substantial public health issue with >4 million United States youth meeting mental health diagnostic criteria. High priority conditions are suicidality, psychosis, and substance use. There is a dearth of measures used to assess pediatric mental health care quality.
What This Study Adds:
New measures of pediatric mental health care quality are feasible to implement and demonstrate substantial variation across hospitals, with some measures varying by race and sex. These measures may be useful for assessing and improving hospital-based pediatric mental health care quality.
Pediatric mental illness is a substantial public health issue in both community and hospital settings. Approximately 20% of youth in the United States (>4 million) meet diagnostic criteria for a mental health disorder,1–3 and nearly 10% of hospitalizations in patients 3 to 17 years old were for primary mental health diagnoses in 2012.4 Inpatient and outpatient costs of treating these patients are estimated at $247 billion annually,2,3 and an increasing prevalence of mental health diagnoses and increasing service use in this population has been pointed to in evidence.5–8 Hospitalizations among pediatric patients with comorbid mental health diagnoses increased in children’s hospitals by 160% from 2005 to 2014,6 with comorbid mental health diagnoses leading to increased length of stay and cost.8 In recognition of this burden, recent federal health policies have identified pediatric mental health care as a key target area for quality measurement and improvement.9,10
In March 2011, the Centers for Medicare and Medicaid Services and the Agency for Healthcare Research and Quality partnered to fund 7 Centers of Excellence that constitute the Pediatric Quality Measures Program mandated by the 2009 Child Health Insurance Program Reauthorization Act.11,12 The charge to the Pediatric Quality Measures Program was to develop new quality of care measures and/or enhance existing measures for children’s health care across the age spectrum.11,12 The Center of Excellence on Quality of Care Measures for Children with Complex Needs (COE4CCN) was charged with developing measures to assess the quality of pediatric mental health care in both inpatient and emergency department (ED) settings. One pediatric quality measure of hospital-based mental health care had national endorsement at the time (outpatient follow-up after mental health hospitalization).13 Only 8 other measures for pediatric mental health care were nationally endorsed, and all were focused on outpatient care, with 5 of those focused on attention-deficit/hyperactivity disorder or developmental screening.13
Our objectives for this study were to develop a new evidence-based set of pediatric mental health care quality measures for use with medical records data and to field test the new measures in 5 nonpsychiatric hospitals providing mental health care to pediatric patients.
The National Quality Forum is a multistakeholder body tasked by the Centers for Medicare and Medicaid Services to review and endorse quality measures for potential Medicare and Medicaid use. Their criteria for endorsement, including whether the measure is high priority (important population or condition), evidence-based, valid, and has a demonstrated performance gap (provider low performance, variation, or disparities in performance across populations),14 were used to guide the study approach.
To develop and test these measures, we did the following: (1) determined common mental health diagnoses in pediatric visits to the ED and inpatient settings,4 (2) performed targeted evidence and clinical guideline reviews for the treatment and follow-up of the most prevalent conditions, (3) drafted pediatric mental health care quality measures on the basis of the evidence reviews, (4) convened a multistakeholder Delphi panel to prioritize the draft measures for further development, and (5) operationalized and field tested the Delphi panel–endorsed measures (Table 1) at 5 hospitals.
Condition-Specific Quality Measure Development
On the basis of our previous findings that depression, psychosis, and substance use are the most common pediatric inpatient mental health diagnoses,4 we focused on suicidality, psychosis, and substance use for quality measure development. Suicidality may be present in depression or psychosis and has associated guidelines of care.15,24 Anxiety, a common outpatient diagnosis, was less common in the inpatient setting4 and was not chosen as a target condition for this quality measure development effort. We reviewed existing clinical practice guidelines and conducted targeted evidence reviews to identify best practices for the treatment, evaluation, and follow-up of pediatric suicidality, psychosis, and substance use. We used these reviews to guide condition-specific quality measure development. The validity and feasibility of the draft quality measures were then evaluated by a multistakeholder panel (psychology, psychiatry, family member, adolescent medicine, state Medicaid, hospitalist, ED) using the RAND–University of California, Los Angeles modified Delphi method25 (see Supplemental Information for descriptions of the literature reviews and the Delphi method). Measures rated favorably during that process were included for field testing (Table 1).
Measure Operationalization and Field Testing
Detailed measure specifications were used to develop an electronic medical record abstraction tool with automated scoring to ensure efficient, reliable, and feasible data collection (see Supplemental Information and online26).
Three children’s hospitals with inpatient psychiatric units participated in the field testing for ED and inpatient measures. They were tertiary care hospitals and not psychiatric specialty hospitals. Two community hospitals participated only in ED measures field testing because they did not have pediatric psychiatric inpatient units. The children’s hospitals were in different geographic regions of the country and had ∼13 000, ∼15 000, and ∼33 000 admissions in 2016, respectively; the 2 community hospitals were located in the same state but were operationally independent and had 8- and 12-bed pediatric units. All study procedures were approved by the participating institutions’ institutional review boards.
Eligible patients were 5 to 19 years old, and eligible adolescents for the Substance Use measure were 12 to 19 years old. Cases for the field test were selected by using International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) and Diagnostic and Statistical Manual, Fourth Edition, Text Revision codes for suicidality, psychosis, and substance use from each hospital’s administrative database of discharges between January 1, 2012, and December 31, 2013 (see Supplemental Tables 6 through 8 for ICD-9-CM codes). Transient psychosis (eg, confusional or delirious states) diagnoses were not included, unless drugs or alcohol were associated. Inpatients who were not discharged from the hospital were excluded from inpatient measures because the study team had no access to information about their subsequent care. All eligible patients from this time period were included in the final sample, with a goal of at least 200 patients per hospital over the 2-year time period.
After a 2-day training, 2 research staff nurses from each of the participating hospitals implemented the data abstraction tool. Each nurse abstracted half of their hospital’s medical record sample, with each chart abstraction taking ∼15 minutes. To assess interrater reliability, a randomly selected subsample of each nurse abstractor’s medical records was reabstracted by the other nurse at 2 of the children’s hospitals (other hospitals were not included because of limited funding). Prevalence and rater bias–adjusted κ statistics (PABAK) were calculated to examine reliability in assessing patient eligibility for each measure and reliability in measure scoring.27
For each condition, individual measure eligibility and scoring were reviewed to determine if the measure should be retained or dropped from the final quality measure set. Reasons for dropping measures fell into 1 of 4 broad categories: (1) eligibility for the measure was too rare for it to be a useful quality measure; (2) scores on the measure were extremely low (<25 on a 0–100 scale) across all hospitals indicating that, although the recommended process of care potentially represents high quality care, it had such low uptake that it should not yet be considered a standard of care; (3) interrater reliability was low, indicating a risk that performance variation might not be reliably and fairly measured; and (4) scores on the measure were extremely high (>90 on a 0–100 scale) across the hospitals, indicating there was little room for improvement. One exception to the last criterion was the decision to retain measures we hypothesized may demonstrate greater performance variability among a more representative sample of hospitals.
For each measure, we assessed associations between performance and the following patient characteristics, drawn from administrative data from participating hospitals: sex, age, race and/or ethnicity, insurance type, and chronic disease status. To determine chronic disease status, we used the Pediatric Medical Complexity Algorithm (PMCA), which is used to categorize patients into 3 categories by using ICD-9-CM codes: patients with no chronic conditions, noncomplex chronic conditions, or complex chronic conditions.28 Data to assess PMCA were not available from the community hospitals’ administrative records.
The detailed measure specifications26 were used to calculate quality measure scores. For individual-level binary measures (eg, mental health assessment in the ED), scores were 0 if absent (poor quality) and 100 if present (good quality). Subcomponents included in multicomponent measures (eg, height, blood pressure, blood glucose, etc, in the multicomponent quality measure, focused on metabolic assessment before initiating antipsychotics) were also scored by using this binary approach and then summarized to produce a mean composite score for the measure on a 0 to 100 scale. Hospital-level scores, summarizing both binary and multicomponent measures, ranged from 0 to 100, with higher scores indicating better quality.
Because some subcomponents in the multicomponent quality measures may be more challenging to pass than others (for instance, the subcomponent baseline electrocardiogram [ECG] was infrequently passed compared with the subcomponent of baseline weight documentation), we adjusted the overall measure score for each patient to account for the level of difficulty associated with passing each subcomponent. This “observed difficulty of delivery” (ODD) adjustment is performed by subtracting the grand mean population pass rate for each subcomponent from each patient’s score for that subcomponent.29,30 This allows a hospital’s performance to reflect success in harder-to-achieve subcomponents of the measure. The overall measure score for each patient is then calculated by averaging the ODD-adjusted subcomponent scores.
We assessed differences in measure performance by patient characteristics and by hospital using bivariate and multivariate regression analyses, using logistic regressions for dichotomous measures and linear regressions for continuous measures. To assess the statistical significance of hospital-level variation against the null hypothesis that all hospitals have the same mean measure scores, we used Fisher’s exact test for dichotomous measures (to avoid the questionable validity of the χ2 test if there are many expected cell counts <5) and analysis of variance for continuous measures. In analyses of hospital variation, we did not adjust for covariates because the measures are process measures rather than outcome measures (which are often risk adjusted). To assess differences by patient characteristics in multivariate analyses, we included predictor variables with a priori face validity (sex, age) and any additional variables with a statistically significant bivariate association with a given measure. Patients with missing data for 1 of the variables (Table 1) were excluded from bivariate analyses of that variable and from multivariable analyses. In the multivariate analyses, to assess associations with patient characteristics, we included a fixed-effect variable for hospitals to account for hospital-level systematic differences in care.
Developing Condition-Specific Quality Measures
We drafted 21 measures on the basis of the literature and expert consensus guideline reviews: 8 for suicidality, 6 for psychosis, and 7 for substance use. In Delphi panel discussions, major themes and challenges that emerged included sparse evidence to inform the measures and potential difficulty in operationalizing measures (eg, defining documentation elements for counseling on lethal means restriction). Delphi scores indicated that 16 had sufficient face validity and feasibility to move forward to field testing. Of these 16, we report on 8 measures that underwent further testing (Table 1; see Supplemental Table 9 for measure-specific rationales for dropping measures from this measure set).
A total of 817 visits were analyzed across the 5 hospitals (n = 446 [55%] for suicidality, n = 321 [39%] for psychosis, n = 50 [6.0%] for substance use; 298 patients were seen in the ED only, n = 320 were seen in the ED and admitted to the hospital, and n = 199 were seen in the inpatient setting only, having been directly admitted). Most patients were teenagers (n = 745 [91%] 12–19-year-olds). There was some racial diversity, with 19% consisting of African American patients but more limited ethnic diversity (4% Hispanic). Insurance types were evenly distributed across private and public insurance. The majority of patients (86%) were seen at the children’s hospitals (Table 2).
Overall performance on the 8 measures is summarized in Table 3. Performance ranged from a low of 27% for discharge communication with the outpatient provider before discharge for inpatients with suicidality to a high of 95% for mental health assessment for patients with suicidality before discharge from the ED. Performance was relatively low for the multicomponent “baseline metabolic testing before starting antipsychotic medications” for patients with psychosis (mean: 69.6), which was driven by 4 of the 8 elements: obtaining blood glucose (61.7%), cholesterol (48.7%), triglycerides (48.7%), and an ECG (20.0%).
For the 8 measures retained in the set, interrater reliability scores ranged from almost perfect to substantial27 on 2 levels: the patient’s eligibility for the measure (PABAK = 0.99) and the child’s score for that measure (PABAK = 0.76).
In bivariate analyses, performance did not vary substantially across patient characteristics, with a few notable exceptions. In the inpatient setting, caregivers of male patients and African American patients with suicidality were less likely to have received counseling before discharge about lethal means restriction (71.0% for male patients versus 89.0% for female patients [P = .001]; 68.8% for African American patients compared with 85.7% for non-Hispanic white patients [P = .03]). Male patients admitted with substance use were less likely to have been assessed for comorbid mental health diagnoses (mean: 77.8 vs 94.9 for female patients, [P = .02]). When assessing psychosis in the ED, those of other race were more likely to be screened for comorbid substance use (44.4 vs 27.5 for non-Hispanic white patients [P = .02]). Neither medical complexity nor insurance status was associated with performance on any measure (Table 3).
There was statistically significant variation across hospitals for the following: (1) whether patients hospitalized with psychosis were screened for comorbid substance use, (2) whether patients hospitalized with psychosis received timely psychiatric evaluation, and (3) whether patients hospitalized with substance use were screened for comorbid mental health conditions.
In multivariate analyses used to assess disparities by patient characteristics, differential performance persisted for male patients compared with female patients (odds ratio [OR]: 0.27 [95% confidence interval (CI): 0.12 to 0.58, P < .001]) and African American patients compared with white patients (OR: 0.31 [95% CI: 0.12 to 0.83, P = .02]) on counseling parents of those with suicidality on lethal means restriction, and differential performance also persisted for male patients on screening those admitted for substance use for other mental health conditions (coefficient: −20.0 [95% CI: −34.2 to −5.8, P = .007]). In addition, patients aged 16 to 19 years with suicidality were more likely to have documentation of communication between the inpatient and outpatient provider, compared with 12- to 15-year-olds (OR: 2.21 [95% CI: 1.05 to 4.65, P = .04]) (Table 4).
We present the results from developing and testing a new set of medical record–based pediatric mental health care quality measures in the hospital setting, focusing on suicidality, psychosis, and substance use. We developed evidence-based measures with face validity, clear performance gaps, demonstrable variation across providers, and disparities between populations. We discuss key findings in these areas below and their implications, limitations, and next steps.
Performance Gap Assessment
Most of the measures had performance lower than 90% (or 90 for nonbinary measures), implying that there is room for improvement in these evidence-based care processes. Two measures had particularly low performance: discharge communication with outpatient providers for patients with suicidality (27.1%) and substance use screening for patients with psychosis in the ED (mean: 30.3). This low performance overall, regardless of whether there is site-to-site variation, indicates the potential for substantial increases with quality improvement (QI) efforts.
There was relatively low performance (mean: 69.6) on performing baseline metabolic testing before starting a new antipsychotic medication for patients admitted for psychosis. In this multicomponent measure, there were 4 elements with particularly low performance rates: obtaining glucose, cholesterol, triglycerides, and an ECG. Youth treated with atypical antipsychotics are known to have increased risk of metabolic syndrome, arrhythmias, and severe weight gain.31,32 Higher performance on these measure subelements will potentially improve our ability to track and address downstream effects of these medications on cardiovascular health.
Mental health assessment in the ED for patients with suicidality was high (95.7%), without substantial site-to-site variation, but it is possible that the participating children’s hospitals had better access to pediatric psychiatric consultative services than community hospitals serving youth. With >70% of children’s hospitalizations occurring in community hospitals nationwide,33 it will be important to assess performance at other community-based EDs, where mental health resources may be more limited.
Finally, there were 3 measures not retained because of low performance, despite support from the Delphi panel and in the literature: alcohol abuse or dependence formal screening for patients presenting for interpersonal violence in the ED (performance of 0 for all hospitals) and for patients presenting with suicidality in the ED (performance of 0 for all hospitals) and face-to-face counseling and referral for patients who screen positive for substance abuse in the ED (performance of 12%–25% of patients across hospitals). These represent clear gaps in quality and could be areas for enhanced psychiatric consultation and improved collaborative care models.
Performance Variation Across Hospitals
For 3 measures (Substance Use Screening for Patients With Psychosis in the ED, Timely Mental Health Consultation for inpatients With Psychosis, and Assessing for Comorbid Mental Health Diagnoses for inpatients With Substance Use), performance varied across sites. This implies practice variations, with a potential for identifiable best practices from high performers.34,35 Future researchers can assess whether QI collaborations can facilitate improved performance and decrease variation across sites.
Disparities by Patient Characteristics
For most measures, performance did not vary in multivariate models by patient characteristics, indicating a lack of strong evidence of disparities in care across specific patient populations. However, for male patients and for African American patients with suicidality, there were lower odds of counseling caregivers on lethal means restriction compared with female patients and white patients. In contrast to our findings, in a previous study by Kruesi et al,16 they did not find differences in this counseling by sex or race, although they only compared white to non-white patients and had a smaller sample size of 100, limiting their power to detect such differences.
Data from other sites, gathered in further testing of this measure, could be used to better characterize this potential disparity and suggest QI efforts to address it. Performance on this measure is particularly important in light of recent work by Runyan et al,36 who found that an ED-based counseling intervention led to substantial improvements in families securing guns (increasing from 67% to 100%) and medications (increasing from 10% to 76%) and work by Scott et al,37 who showed that 43.5% of families with children who had a history of self-harm risk factors had household firearms, with 11.6% of those stored unlocked and loaded.
In our study, we also found that male patients admitted for substance use were less likely to be screened for comorbid mental health diagnoses, compared with female patients. This practice variation may reflect a common perception that female patients have a higher incidence of mental health diagnoses than male patients. However, this screening is a universal recommendation from the American Academy of Child and Adolescent Psychiatry.21 Given the high co-occurrence of substance use and comorbid mental health diagnoses,22,23 with our findings, we suggest that measuring and reporting on this quality measure could meaningfully address this disparity.
Our findings should be interpreted in light of several limitations. In this first field testing of these measures, we assessed performance in a limited number of hospitals. Having established feasibility of implementation, subsequent testing in more children’s and community hospitals will better characterize generalizability and variations across a larger set of hospitals. Also, the number of patients admitted for substance use was low. This likely reflects that children’s hospitals often do not have inpatient substance use services and that patients with substance use are generally admitted to specialized rehabilitation centers. We did not test for predictive validity, which should be done in future studies of these measures to assess whether better performance on them predicts decreased subsequent hospital-based use and costs of care. Finally, although they represent a national multistakeholder consensus and have undergone external review,38 the National Quality Forum endorsement criteria have not been tested to assess whether they lead to measures that support improved health outcomes in the population.
We present the results of the development and testing of a new set of medical record–based measures to assess pediatric mental health care quality in the hospital setting. We focused on high priority populations, those with suicidality, psychosis, and substance use, and identified measures with a demonstrated performance gap and variations in performance across hospitals or disparities across patient populations. In our findings, it is suggested that these measures may be useful for assessing and improving hospital-based pediatric mental health care quality for a vulnerable and high-priority population.
- Accepted March 28, 2018.
- Address correspondence to Naomi S. Bardach, MD, MAS, Department of Pediatrics, University of California San Francisco, Philip R. Lee Institute for Health Policy Studies, 3333 California St, Suite 265, San Francisco, CA 94118. E-mail:
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: Funded by a cooperative agreement with the Agency for Healthcare Research and Quality and Centers for Medicare and Medicaid Services, grant U18HS020506, part of the Children’s Health Insurance Program Reauthorization Act Pediatric Quality Measures Program. Dr Bardach was funded by the National Institute of Child Health and Human Development (K23HD065836). Funded by the National Institutes of Health (NIH).
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
- Shaffer D,
- Fisher P,
- Dulcan MK, et al
- ↵Costly mental disorders affect millions of US children and teens: news from the centers for disease control and prevention. JAMA. 2013;310(1):23
- Bardach NS,
- Coker TR,
- Zima BT, et al
- Health Care Cost Institute
- Zima BT,
- Rodean J,
- Hall M,
- Bardach NS,
- Coker TR,
- Berry JG
- Pfuntner A,
- Wier LM,
- Stocks C
- Doupnik SK,
- Lawlor J,
- Zima BT, et al
- Pallone F
- Strokoff SL
- Zima BT,
- Murphy JM,
- Scholle SH, et al
- American Academy of Child and Adolescent Psychiatry
- McCormick KA,
- Moore SR,
- Siegel RA
- Seattle Children’s Research Institute
- Sim J,
- Wright CC
- Simon TD,
- Cawthon ML,
- Stanford S, et al; Center of Excellence on Quality of Care Measures for Children With Complex Needs (COE4CCN) Medical Complexity Working Group
- Scott J,
- Azrael D,
- Miller M
- Gold M,
- Conwell L,
- Stewart K,
- Nysenbaum J,
- Peterson S
- Copyright © 2018 by the American Academy of Pediatrics