Abstract
OBJECTIVE: Identifying gaps in care and improving outcomes for severely ill children requires the development of evidence-based performance measures. We used a systematic process involving multiple stakeholders to identify and develop evidence-based quality indicators for high acuity pediatric conditions relevant to any emergency department (ED) setting where children are seen.
METHODS: A prioritized list of clinical conditions was selected by an advisory panel. A systematic review of the literature was conducted to identify existing indicators, as well as guidelines and evidence that could be used to inform the creation of new indicators. A multiphase, Rand-modified Delphi method consisting of anonymous questionnaires and a face-to-face meeting of an expert panel was used for indicator selection. Measure specifications and evidence grading were created for each indicator, and the feasibility and reliability of measurement was assessed in a tertiary care pediatric ED.
RESULTS: The conditions selected for indicator development were diabetic ketoacidosis, status asthmaticus, anaphylaxis, status epilepticus, severe head injury, and sepsis. The majority of the 62 selected indicators reflect ED processes (84%) with few indicators reflecting structures (11%) or outcomes (5%). Thirty-seven percent (n = 23) of the selected indicators are based on moderate or high quality evidence. Data were available and interrater reliability acceptable for the majority of indicators.
CONCLUSIONS: A systematic process involving multiple stakeholders was used to develop evidence-based quality indicators for high acuity pediatric conditions. Future work will test the reliability and feasibility of data collection on these indicators across the spectrum of ED settings that provide care for children.
- AHRQ —
- Agency for Healthcare Research and Quality
- CNS —
- central nervous system
- CT —
- computed tomography
- DKA —
- diabetic ketoacidosis
- ED —
- emergency department
- GRADE —
- Grading of Recommendations Assessment, Development and Evaluation
- ICC —
- intraclass correlation
- ICD-10 —
- International Classification of Diseases, 10th Revision
- IO —
- intraosseous
- IV —
- intravenous
- NQF —
- National Quality Forum
- SRs —
- systematic reviews
Assessing the quality of health care is an international priority.1–4 Research has revealed that performance measurement improves health care outcomes.5,6 According to the commonly referenced Donabedian framework, quality indicators are explicitly defined and measurable items referring to the structures (staff, equipment, and facilities), processes (prescribing, investigations, interactions between professionals and patients), or outcomes (mortality, morbidity, or patient satisfaction) of care.4,7,8 Quality indicators have been developed for a number of health care settings, including emergency departments (EDs).9–13
However, despite the fact that children are frequent users of emergency care,14 there is a lack of research on indicators specific to the pediatric population. For example, <5% of children are affected by the 3 conditions most frequently addressed in adult outcomes research (diabetes, heart disease, and arthritis).15 Similarly, quality measures that are part of pediatric emergency practice have not been systematically developed or validated.15 Performance measures specific to pediatrics and pediatric emergency medicine have been identified as a research priority.16–18
Evidence indicates that there is substantial practice variation for pediatric patients among emergency care providers, and that many providers do not optimally manage seriously injured or ill children.19–21 Most of the recent work on practice variation and lack of adherence to practice guidelines in the pediatric ED setting has been done on common, often lower acuity conditions,21–26 despite evidence of a similar gap between knowledge and practice in severely ill and injured children.20 Identifying gaps in care for high acuity conditions, where improvement is likely to have the largest impact on quality of life and longevity,19 requires valid and reliable quality indicators. The objective of this project was to use a systematic process involving multiple stakeholders to review existing indicators and develop new indicators for high acuity pediatric conditions relevant to any ED setting where children are seen.
Methods
We used a systematic, multiphase, Rand-modified Delphi method congruent with the process for quality indicator development as outlined by the Agency for Healthcare Research and Quality (AHRQ)27 and used in previous indicator work.4,7,10,11,13,28 Ethics approval for this study was obtained by the Conjoint Health Research Ethics Board of the University of Calgary.
Phase 1: Selection of Target Conditions
We convened a 32-member advisory panel to select target conditions. The panel included representatives from stakeholder organizations, emergency medicine clinicians, administrators, and decision-makers from the United States and Canada. Panel members were identified by contacting stakeholder organizations and the ED directors of all pediatric EDs, and a sample of rural and general EDs across Canada, and asking them to provide names of individuals with expertise in pediatrics, ED care, or quality improvement. We analyzed health administrative data from 2006 to 2008 on the main diagnosis for high acuity pediatric patients, defined as patients age 0 to 19 years old prioritized as resuscitation and emergent by using the Canadian Triage and Acuity Scale29 (Supplemental Information 3). We used these data on the most frequent main diagnoses seen in all EDs in Ontario and Alberta, representing 50% of Canada’s population, to construct an initial list of potential conditions. We provided panelists with frequency data on the initial list and invited them to suggest additional conditions. In an e-mail survey, panelists were asked to use a scale from 1 (strongly disagree) to 9 (strongly agree)27,30 based on National Quality Forum (NQF) measure evaluation criteria31 to rate the final list of conditions on the following: importance (potential for morbidity or mortality associated with the condition), impact (potential to address gap between current and best practice), and validity (adequacy of scientific evidence linking performance of care to patient outcome). The survey was tested for face validity before dissemination and e-mail reminders were sent at weeks 2 and 3 to optimize response. It was decided a priori that conditions with mean scores ≥7 across all 3 criteria by ≥70% of respondents would be retained.
Phase 2: Indicator Identification and Development
We conducted a systematic review of the literature to identify existing indicators and high quality national and international guidelines, systematic reviews (SRs), and randomized controlled trials that could be used to generate new indicators for the selected conditions. The search strategy (Supplemental Table 5) was developed by a medical research librarian in consultation with the research team. We searched the following bibliographic databases: PubMed, Cumulative Index to Nursing and Allied Health Literature, Embase, the Cochrane Collaboration and Evidence-Based Emergency Medicine, Database of Abstracts of Reviews of Effects for SRs, the National Guideline Clearinghouse, the Canadian Medical Association InfoBase, the National Electronic Library for Health, Turning Research into Practice, and Best Bets from 1980 to September 2010. Targeted hand searches of relevant journals and conference proceedings (Supplemental Table 6) were conducted for a 3-year period (2007–2010). Due to resource constraints, only articles in English were included. Guidelines and existing indicators were also identified by searching specialty society Web sites, internationally recognized guidelines such as Pediatric Advanced Life Support and Advanced Trauma Life Support, and Web sites that focus on quality and performance improvement (Supplemental Table 7).
Two research team members (Dr Stang and Ms Crotts) independently screened all titles, abstracts, and guidelines. The reviewers included for full text review any articles or guidelines that either reviewer thought might provide existing indicators for the target conditions or relevant clinical recommendations that could be used to guide indicator development. Two reviewers (Dr Stang and Ms Crotts) then independently reviewed all full text articles and selected for final inclusion any articles that reported on quality indicators for the identified conditions. The Appraisal of Guidelines Research and Evaluation instrument was applied independently by 2 reviewers (Dr Stang and Ms Crotts) to assess the quality of the guidelines.32 We developed new indicators from high quality national or international guidelines. We defined high quality as recommended or strongly recommended by both reviewers by using the Appraisal of Guidelines Research and Evaluation instrument. Criteria for indicator development included the following: (1) the strength of the recommendation with only strong recommendations considered33; (2) the consistency of the recommendation between guidelines; and (3) the strength of the evidence linking ED structure or care process to patient outcome.8 We developed new indicators based on consensus between 2 researchers (Dr Stang and Ms Crotts), and review by remaining authors (Drs Guttmann, Straus, and Johnson). Two reviewers (Dr Stang and Ms Crotts) independently assessed the quality of the evidence upon which each indicator is based by using the Grading of Recommendations Assessment, Development and Evaluation (GRADE)34 system with 1 = very low quality (expert opinion), 2 = low quality, 3 = moderate quality, and 4 = high quality.34 We used SRs and randomized controlled trials identified in the literature search to grade the strength of the evidence supporting a link between the performance of care specified by the indicator and patient outcomes.
Phase 3: Indicator Selection
We convened an expert panel of 14 individuals, consisting of general and pediatric emergency medicine clinicians, a nurse manager, a pediatric intensivist, quality improvement and safety researchers, and ED administrators (Supplemental Information 2). The panelists were selected based on recommendations by members of the advisory panel and represented the full spectrum of ED settings where children are seen.
We used a modified Delphi technique consisting of 2 rounds of anonymous questionnaires and a face-to-face meeting of the expert panel to generate a final list of indicators. Before completing the first questionnaire, panelists were e-mailed a description of the goals of the research project, the full list of existing and newly developed indicators classified according to Donabedian’s framework (structure, process, and outcome),8 and the grading of the quality of evidence. The first e-mailed questionnaire, sent in December 2010, was pilot tested for face validity, and e-mail reminders were sent at weeks 3 and 4 to optimize response. Panelists were asked to rate the identified indicators on the criteria of (1) relevance to the care of high acuity pediatric patients seen in any ED setting and (2) the degree to which measurement of the indicator would impact the quality of care provided. Panelists were asked to rate each indicator on both criteria by using a Likert scale from 1 (strongly disagree) to 9 (strongly agree).27,30 We used a predetermined decision rule that any indicators rated ≤3 on both criteria by all panelists would be discarded from further consideration.
The expert panel met in-person in January 2011 after completion of the first survey. At the meeting, the panelists reviewed anonymized ratings for each indicator. Panelists were also provided with their individual ratings from the first survey and given the opportunity to suggest additional indicators. At the end of the meeting, panelists were asked to independently re-rate each indicator by using the same criteria of relevance and impact and prioritize the top 5 indicators for each condition considered the most important to measure to improve quality of care and patient outcomes. Based on previous indicator development work28 and consensus by the expert panel, we used a predetermined decision rule that indicators rated ≥7 (moderately agree) across both criteria by ≥70% of panelists would be included in the final list. Transcripts of the meeting were documented.
Phase 4: Feasibility of Indicator Measurement
Data from January 2009 to December 2010 on the selected indicators were collected retrospectively from a tertiary care, pediatric ED with 65 000 annual visits. The goal of this phase was to determine the feasibility, described by the NQF as the extent to which the data are readily available, retrievable without undue burden, and can be implemented for performance measurement, and reliability of data collection.31 Based on the NQF description and previous pediatric indicator development work,35 we defined feasibility as measures that could be generated by using existing data sources including chart review, physician order entry, and ED patient tracking systems. Performance measurement in a pediatric ED also provided an initial estimate of practice variation and provider compliance with care processes or structures specified by the indicators. We created a standard profile of measure specifications, the methods by which the target population is identified and the data actually collected.36 This included the International Classification of Diseases, 10th Revision (ICD-10) diagnostic codes and inclusion criteria for each condition, and the specific data elements, such as numerator, denominator, and exclusions, for each indicator (Supplemental Information 3). Data abstractors (3 experienced chart reviewers) used a standardized database (Access 2010) that was piloted for accuracy and clarity on a sample of 10 patient visits for each condition. Interrater reliability was calculated for a random sample (10%) of charts by using intraclass correlation (ICC), Cohen’s unweighted κ, or proportion agreement depending on data elements abstracted.37–40
Results
Figure 1 summarizes the process of indicator selection.
Process of indicator selection.
Phase 1: Selection of Target Conditions
Ninety-one percent (32/35) of invited advisory panel members agreed to participate and identified 13 potential conditions. The number of high acuity pediatric ED visits for these conditions over a 2-year period ranged from 24 for meningitis to 1458 for burns (Table 1). Eighty-one percent (26/32) of the advisory panel members completed the condition prioritization survey, and 6 conditions had a mean score of ≥7 by ≥70% of respondents (Table 1).
Volume of Pediatric Patients (Age 0–19 Years) With Identified Conditions Seen in Alberta and Ontario in 2006–2008 and Results of Advisory Panel Condition Selection
Phase 2: Indicator Identification and Development
Table 2 reveals the results of the search of the literature and quality improvement Web sites. We identified 46 existing indicators for the 6 targeted conditions. We derived 51 new indicators from recommendations contained in national and international guidelines. The interrater reliability for determining the level of evidence upon which each indicator was based was acceptable (κ > 0.6)39 for all conditions except status epilepticus (κ = 0.15). The overall interrater reliability for the GRADE rating was κ = 0.68.
Results of the Systematic Review of the Literature and Indicator Selection Process
Phase 3: Indicator Selection
We presented 97 indicators to the expert panel for initial rating, and none of the indicators were discarded after the first survey. The expert panel suggested an additional 17 indicators for discussion at the face-to-face meeting. The expert panel selected 62 quality indicators. In addition to the indicators for each condition, the panel selected 2 general measures relevant for all high acuity pediatric patients (Table 3). The majority of the indicators reflect ED processes (84%, n = 52), with few indicators reflecting structures (11%, n = 7) or outcomes (5%, n = 3).8 Thirty-seven percent (n = 23) of the indicators selected are based on moderate or high quality evidence.
Final List of Indicators Selected by Expert Panel and Results of Indicator Measurement at a Pediatric ED
Phase 4: Feasibility of Indicator Measurement
Table 4 reveals the age and proportion of patients who met the inclusion criteria for each of the conditions. A total of 1681 unique visits were identified based on age, acuity, and ICD-10 codes. The proportion of patients meeting the inclusion criteria for each condition ranged from 22% for severe head injury to 84% for anaphylaxis. The interrater reliability for determining which patients met the inclusion criteria was acceptable (κ > 0.6)39 for all conditions except for severe sepsis (κ = 0.23).
Demographic Data and Proportion of Patients Who Met Inclusion Criteria for High Acuity Conditions
Results of indicator measurement in a pediatric ED are shown in Table 3. For the indicators reflecting timeliness of care, ED arrival (first time recorded) was used as time zero based on consensus of the expert panel. For diabetic ketoacidosis (DKA), data were accessible from chart review and a physician order entry system for all of the applicable indicators with high interrater reliability. Compliance with the processes of care specified by the indicators was good with minimal practice variation identified. For example, no patients received bicarbonate (n = 62), and the expert panel agreed that this number should be low (ie, <1%). The majority of patients received potassium replacement (91%, n = 65) and were treated with the appropriate insulin dose and route (88%, n = 59). Required data were available for the status asthmaticus indicators, but interrater reliability was more variable. The majority of patients received a systemic corticosteroid during the visit (99%, n = 180), and β2-agonists and systemic steroids were provided in a median of 19 and 27 minutes, respectively. Reliability was lower for indicators that relied on information written in the chart by a clinician (compared with data from physician order entry or patient tracking systems), such as the “percentage of admitted patients with objective assessment of severity of their condition” (49%, κ = 0.08) and “patients referred to an asthma education program” (32%, κ = 0.48). From the currently available data sources, it is not possible to determine if the performance on these indicators was low due to practice variation and poor provider compliance or lack of documentation. For anaphylaxis, 68% of patients received epinephrine in the ED, and 94% of patients who received epinephrine in the ED were treated by the appropriate route. Interrater reliability was good for the anaphylaxis indicators.
For the status epilepticus indicators, the median time to second line anticonvulsant administration was 31 minutes (ICC = 0.89), with 87% (κ = 1.0) of patients receiving a benzodiazepine as initial therapy and 86% (κ = 1.0) of patients with rapid bedside glucose documented. Interrater reliability was not calculated for “attainment of seizure control within 30 minutes” and “receipt of an antiepileptic within 10 minutes” due to the small number of cases available for reliability comparison.38 Four of the severe head injury indicators were specific to referring (nontrauma) centers (Table 3) and were not applicable to the center where data were collected. Data for 2 of the indicators “head computed tomography (CT) scan performed and analyzed within 1 hour of request” and “neurosurgeon response time >30 minutes” were not available by using existing data sources. Compliance was high with respect to documentation of central nervous system (CNS), blood pressure, and oxygen saturation monitoring. However, 9.5% (n = 21) of patients were not intubated before leaving the ED, and only 46% (n = 13) of intubated patients had documented end tidal CO2 monitoring. Interrater reliability was high for the head injury indicators with the exception of “hourly CNS monitoring” (κ = 0.41) and “CT within 1 hour of arrival” (Agreement = 0.44). For severe sepsis/septic shock, the median time from ED arrival to isotonic fluid bolus was 68 minutes, 63 minutes to intravenous (IV)/intraosseous (IO) insertion, and 189 minutes to antibiotic administration. Sample size was also small, and interrater reliability not calculated, for the severe sepsis indicators that measured fluid refractory shock (n = 7), dopamine resistant shock (n = 3), and patients treated with pressors who had not received 60 cc/kg of fluid (n = 8). The median time to first provider for all resuscitation and emergent patients was 34 minutes. In addition, 1.3% of 12 636 resuscitation and emergent patients discharged from the hospital returned within 48 hours and were admitted. Data on time to provider and return visits were from an administrative database, and interreliability data could not be assessed.
Discussion
This rigorous process provides 62 evidence and expert consensus based quality indicators for high acuity conditions relevant for any ED setting where children are seen. Previous work on indicators for pediatric ED patients has focused on administrative and clinical measures, such as length of stay in the ED after admission, that are not presentation or condition specific41,42; common conditions seen in any ED setting28; and creation of a balanced scorecard to reflect all facets of pediatric emergency care.43 None of the previous work targets high acuity conditions. A recently published analysis of existing pediatric measures relevant to emergency care revealed that most disease specific measures address a few common pediatric conditions and suggested that future measures should consider illness severity.44
The 4 phases of this study followed the process for indicator development and assessment as outlined by the AHRQ.27 These phases included the following: expert engagement of an advisory panel to identify conditions for indicator development, identification of candidate indicators including literature review and summary of evidence (using GRADE), expert panel review and selection of indicators by using a modified Delphi process, and assessment of feasibility of candidate indicators including empirical analyses.
One of the strengths of this project was the comprehensive search for existing indicators and high quality guidelines for the development of new indicators, and the systematic application of GRADE in assessing the level of evidence upon which each indicator is based. Although the GRADE system was a useful means of summarizing the evidence for the expert panel, we identified a number of challenges with its use, including the need for significant time, resources, and research methodology expertise. Even for raters with clinical and research backgrounds (physician with masters level epidemiology training and an experienced research nurse), the interrater reliability for GRADE assignment was variable. Not surprisingly, the κ was lower for conditions with a less developed evidence base, such as status epilepticus, as compared with asthma or DKA.
Given the large number of indicators considered and the variable quality of evidence available, the opportunity for the expert panel to discuss the indicators in person was an integral part of the indicator selection process. Previous work has also emphasized the importance of a face-to-face meeting of the expert panel.28 Another similarity with previous work on pediatric quality measurement was that the majority of indicators selected by the expert panel reflected ED processes.28,44 The only structural indicators retained by the expert panel assessed the presence of clinical guidelines for each of the conditions, despite a paucity of evidence linking guidelines to patient outcome. These findings illustrate the need for further work developing outcome indicators and establishing links between structure and process indicators and patient outcome.
A final strength of the project was the inclusion of a data collection phase to assess the feasibility and reliability of indicator measurement. Previous work on quality measures for the pediatric population has emphasized the importance of testing measures in the real world settings where care will be assessed.45 The measurement stage of this study highlighted a number of issues that are relevant to the interpretation and application of the indicators. For example, a challenge identified in the data collection phase was the difficulty assigning time zero for complex conditions such as severe sepsis, DKA, and status epilepticus. We decided a priori to use ED arrival as time zero but recognize that this may not be accurate as a septic child may decompensate while in the ED and not have met the criteria for severe sepsis at presentation. Similarly, a child may start seizing while in the ED such that the time from ED arrival to first anticonvulsant treatment may not accurately describe the timeliness of seizure treatment. Difficulty in assigning a time zero may account in part for the relatively long length of time from ED arrival to isotonic fluid bolus; IV/IO insertion; and antibiotic administration (Table 3) for patients with severe sepsis. These results highlight the need for data collection across multiple centers to establish reasonable benchmarks for these indicators, especially for conditions such as severe sepsis where identifying the denominator is a challenge, as illustrated by our poor interrater reliability (κ = 0.23) in applying an operational definition based on an international consensus definition.46
Another challenge we encountered was the small sample size for a number of the indicators. The combination of low event rates and small numbers of eligible patients is a recognized issue in performance measurement,47 particularly in pediatrics.48 The conventional minimum sample size is ≥30 eligible patients.48 Our experience collecting 2 years of feasibility data suggests that even tertiary care pediatric centers may not be able to accrue sufficient numbers to adequately measure performance for conditions such as severe head injury or severe sepsis (Table 4).
A number of methods have been suggested to address the issue of small sample size in indicator reporting. One approach is to only report on institutions with adequate numbers of eligible patients (≥30).48 However, applying measures only to institutions with a particular volume of high acuity cases would miss a significant portion of patients who are seen in smaller centers, and it is in these centers where practice variation and the potential for improvement may be greatest.21,24–26,49 Many of the indicators developed here would be useful even for smaller volume centers for local quality improvement initiatives, such as measuring the impact of a new clinical pathway. Another solution already in use by the US Department of Health and Human Services is to aggregate data over 3 years.48 In addition to changing the time frame of reporting, other methods that could be used to adapt certain indicators for public reporting and accountability for smaller volume centers include the following: changing reporting conventions to reflect uncertainty when it exists48; using a composite measure of multiple outcomes48; application of statistical methods such as indirect estimation and hierarchical modeling47; and utilizing selected measures on a regional rather than an institutional level.
In addition to the challenges identified above, the results of this study are subject to a few limitations. First, despite our efforts to incorporate the best possible evidence, less than half of the final indicators selected by the expert panel are based on moderate or high quality evidence. Unfortunately, this is likely a reflection of gaps in the overall quantity and quality of child health research50 and reinforces the need for further high quality research in pediatrics. Second, information on the volume of patients seen and the availability of data may not be generalizable to other pediatric institutions and is certainly not applicable to smaller, nonpediatric hospitals. However, the measure specifications and reliability data should be applicable to most settings. A final limitation is the dependence on written documentation for some of the indicators. In general, interrater reliability and compliance with the process or structure specified by the indicator was lower for indicators that required clinician documentation, as compared with indicators where data were available from a patient tracking or physician order entry system. This limitation needs to be taken into account in future applications of the indicators, either at the data collection phase through the use of alternate data such as physician billing, electronic health records, and pharmacy data or in the benchmarking/reporting phase.
Conclusions
This evidence and expert consensus based process provides indicators for high acuity pediatric conditions potentially suitable for a range of applications from local quality improvement initiatives to public reporting. The results of this study contribute significantly to the existing body of quality indicators for the emergency care of pediatric patients. Future work will focus on multicenter benchmarking and data collection to test the validity and feasibility of these indicators across the spectrum of ED settings that provide care for children. This research provides clinicians, researchers, and policy makers with tools to improve the quality of pediatric care for severely ill children seen in any ED setting.
Footnotes
- Accepted July 30, 2013.
- Address correspondence to Antonia S. Stang, MDCM, MBA, MSc, Alberta Children’s Hospital, 2888 Shaganappi Trail, Calgary AB, T3B 6A8. E-mail: antonia.stang{at}albertahealthservices.ca
Dr Stang conceptualized and designed the study, secured funding, designed and piloted surveys and data extraction forms, screened articles, extracted data, interpreted data, and drafted and revised the manuscript; Dr Straus provided methodological advice, reviewed and revised surveys, reviewed and revised tables and figures, and revised the manuscript; Ms Crotts screened articles, extracted data, coordinated expert panel meeting, created Access database review charts, and revised tables and figures; Dr Johnson provided methodological advice, reviewed and revised surveys, reviewed and revised tables and figures, and revised the manuscript; and Dr Guttmann provided methodological advice, reviewed and revised surveys, facilitated expert panel meeting, provided methodological advice, interpreted data, reviewed and revised tables and figures, and revised the manuscript.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: All stages of this work were funded by the Canadian Institute for Health Research (CIHR) MOP-102676.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
References
- Copyright © 2013 by the American Academy of Pediatrics