Performance of the Global Assessment of Pediatric Patient Safety (GAPPS) Tool
BACKGROUND AND OBJECTIVE: Efforts to advance patient safety have been hampered by the lack of high quality measures of adverse events (AEs). This study's objective was to develop and test the Global Assessment of Pediatric Patient Safety (GAPPS) trigger tool, which measures hospital-wide rates of AEs and preventable AEs.
METHODS: Through a literature review and expert panel process, we developed a draft trigger tool. Tool performance was tested in 16 academic and community hospitals across the United States. At each site, a primary reviewer (nurse) reviewed ∼240 randomly selected medical records; 10% of records underwent an additional primary review. Suspected AEs were subsequently evaluated by 2 secondary reviewers (physicians). Ten percent of records were also reviewed by external expert reviewers. Each trigger's incidence and positivity rates were assessed to refine GAPPS.
RESULTS: In total, 3814 medical records were reviewed. Primary reviewers agreed 92% of the time on presence or absence of a suspected AE (κ = 0.69). Secondary reviewers verifying AE presence or absence agreed 92% of the time (κ = 0.81). Using expert reviews as a standard for comparison, hospital-based primary reviewers had a sensitivity and specificity of 40% and 91%, respectively. As primary reviewers gained experience, their agreement with expert reviewers improved significantly. After removing low-yield triggers, 27 and 30 (of 54) triggers met inclusion criteria to form final manual and automated trigger lists, respectively.
CONCLUSIONS: GAPPS reliably identifies AEs and can be used to guide and monitor quality improvement efforts. Ongoing refinement may facilitate future interhospital comparisons.
- AE —
- adverse event
- EHR —
- electronic health record
- GAPPS —
- Global Assessment of Pediatric Patient Safety
- IHI GTT —
- Institute for Healthcare Improvement’s Global Trigger Tool
- PACHMT —
- Pediatric All Cause Harm Measurement Tool
What’s Known on This Subject:
Adverse events are a leading cause of death and injury in the United States, but robust, systematic measures of hospital safety are lacking, particularly in pediatrics.
What This Study Adds:
We developed and tested the Global Assessment of Pediatric Patient Safety trigger tool, which systematically identifies adverse events among pediatric inpatients. In a 16-center study, the trigger tool performed reliably; its widespread implementation could substantially improve patient safety surveillance.
Adverse events (AEs), events in which medical care causes harm, have been recognized as a leading cause of death and injury in the United States since at least the late 1990s.1,2 Until recently, hospitals have identified AEs and preventable AEs largely by relying on passive voluntary reporting systems, which detect only a small percentage of all AEs.3 Active surveillance tools that reliably measure and track AEs have been lacking. Over the past decade, work has been conducted to develop “trigger tools,” instruments that look for discrete signals potentially suggestive of an AE in medical records (eg, the administration of naloxone, which may be used to reverse an inadvertent overdose of a narcotic), as a means of expeditiously and reliably identifying AEs (eg, hypotension due to narcotic overdose).4 Triggers are not themselves AEs, but help to identify them; when a trigger occurs, review of the medical record is required to confirm whether an AE did or did not in fact occur. The Institute for Healthcare Improvement’s Global Trigger Tool (IHI GTT) has become the standard in the field for identifying and measuring rates of AEs in the care of hospitalized adults.5 Trigger tools have proven far more sensitive and reliable than voluntary reporting systems or administrative screening tools, identifying AEs at 10 to 100 times the rate of these other methods.3,6
Trigger tools designed for adult hospitals may not perform optimally in pediatrics because their AE types and rates likely differ. Pediatric trigger tools have been developed, but have not been as widely used or as rigorously tested as adult trigger tools. The most recent has been the Pediatric All-Cause Harm Measurement Tool (PACHMT), which found that children, like adults, experience high rates of AEs due to medical care.7
Building on previous efforts, including PACHMT, our objective was to further develop and rigorously test the performance of a refined pediatric global trigger tool: Global Assessment of Pediatric Patient Safety (GAPPS).
To develop GAPPS, we conducted a review of the pediatric and adult literature to identify candidate triggers for possible inclusion in GAPPS.8–14 In addition, we spoke with trigger developers, including those who developed PACHMT and the IHI GTT, to identify any additional triggers and inform GAPPS development.7 From our review, we compiled a list of 78 candidate triggers for possible inclusion in GAPPS; this list of candidate triggers was compiled by the GAPPS team with input from the developers of PACHMT.
Expert Stakeholder Panel
Using the Rand/UCLA Appropriateness Method,15,16 we convened an expert stakeholder panel to review candidate triggers and assess their validity and feasibility.17 Our panel consisted of representatives selected by the Academic Pediatric Association, the American Academy of Family Practice, the American Academy of Pediatrics, the American Nurses Association, Consumers Advancing Patient Safety, the Institute for Healthcare Improvement, the Joint Commission, the National Patient Safety Foundation, and the Society for Adolescent Health and Medicine to represent their organizations. The method features rounds of anonymous scoring and interactive discussion. The first round of scoring took place before the panel convened to assess panelists’ preliminary opinions of the candidate triggers. Panelists rated each trigger separately for validity and feasibility of detection on a scale of 1 to 9. A summary of preliminary ratings was sent to each panelist, with his/her scores identified for him/her on a distribution of anonymous scores from other panelists. We then convened the panel (by phone) to discuss the triggers, with a focus on those for which the panel’s scores showed disagreement or uncertainty. At the end of the discussion, panelists confidentially rescored all candidate triggers using the same 1-to-9 scale. A working list of triggers for multicenter testing was generated from these postdiscussion scores. To be included in the draft GAPPS tool, a trigger had to receive a score of ≥7 for both validity and feasibility to ensure that each was strongly endorsed by the panel and that the GAPPS tool as a whole had content validity.
National Field Test: Hospital Selection
After international review board approval, to rigorously evaluate the performance of GAPPS across a range of inpatient settings, we identified hospitals to participate in the GAPPS National Field Test through the Pediatric Research in Inpatient Settings network. Our objective was to select academic and community hospitals (using teaching status reported by the American Hospital Association)18 in each of the 4 major geographic regions of the United States.
For each hospital, we reviewed 10 randomly selected admissions of ≥24 hours (including patients with surgical and medical disease processes from all acute inpatient areas) from each quarter between 2007 and 2012 (240 records per hospital). Patients >18 years of age and those admitted primarily for psychiatric (without a concurrent acute medical issue) or rehabilitation care were excluded, consistent with exclusions in adult trigger tool studies.11,13
Record Review Process
Hospital review teams received a 3-part webinar training in the use of GAPPS from experts in the field. Nurses (front-line providers or nurses with quality or research experience identified by each site’s leader) served as primary reviewers, conducting initial reviews of medical records. The hospitals studied included those with electronic health records (EHRs), paper records, or both. When primary reviewers found a positive trigger (eg, rising creatinine), they investigated the record further to determine whether 1 or more AEs had occurred. Harms due to medical care that occurred during the studied hospitalizations, as well as those present at admission (eg, a pressure ulcer that originated at a referring facility), were captured. The primary review of each record was performed in a standardized fashion and was either completed within, or truncated at, 30 minutes (Fig 1).
Primary reviewers presented all suspected AEs to 2 secondary reviewers (physicians working in the study hospitals) who independently made final determinations about the presence, severity (using the National Coordinating Council for Medication Error Reporting and Prevention Index19; Supplemental Table 3), and preventability (using a 4-point Likert scale; Supplemental Table 4 of any suspected AEs. Secondary reviewers discussed and resolved all cases for which they had initial disagreement. Prediscussion interrater reliability was calculated.
The reliability of the record review and rating process was assessed through checks of interrater reliability at each review stage. A 10% random sample of records (∼24 per site) was reviewed by a local, second primary reviewer to determine the reliability of the hospital-based primary reviews (internal reliability testing). At the secondary physician review stage, dual independent review allowed for calculation of interrater reliability. In addition, to further assess tool reliability, a 10% sample of records from each hospital was reviewed by a team of expert primary and secondary external reviewers (external reliability testing) with extensive experience using trigger tools (see Acknowledgments).
Criterion Validity Testing
To establish the criterion validity of the tool, expert reviews served as a criterion standard for the evaluation of hospital-based review teams, which allowed for calculation of sensitivity and specificity of hospital-based reviews, as compared with expert review.
Data were securely transferred (using REDcap; http://www.project-redcap.org/) from each site to Boston Children’s Hospital for analysis. To measure interrater reliability, we used the κ statistic that quantifies the level of agreement between responses from 2 reviewers, accounting for chance. The κ values can range from –1 to 1; a κ value of 1 indicates complete agreement, whereas a negative κ value indicates less than a chance level of agreement. We used a simple κ statistic for variables with dichotomous outcomes (presence of AEs, presence of trigger).20 For ordinal outcomes, including number of AEs, number of triggers, and severity and preventability ratings, we measured a weighted Fleiss–Cohen κ.21 Weighted κ values allocate less weight to level of agreement because categories of responses are farther apart. The κ values are presented along with 95% confidence intervals.
SAS 9.3 (SAS Institute, Inc, Cary, NC) was used to perform all analyses. The study was approved by the Boston Children’s Hospital institutional review board.
After national testing, each trigger was evaluated to determine its incidence and to determine the frequency with which it led to AE identification confirmed by secondary physician reviewers (positivity rate). Based primarily on the incidence (≥10 occurrences) and positivity rate (≥10% triggers were confirmed as AEs) of our draft triggers, a final list of manual triggers was created. We excluded triggers with very low incidence (<10 occurrences) because manually searching for them increased the burden of conducting reviews with little yield. We included 3 triggers as exceptions to this rule because they were associated with preventable and severe AEs.
In addition, we created a final list of automated triggers for use through EHR systems. All triggers with positivity rates ≥10% that we were able to automate (process described below) were included. We included automatable triggers with very low incidence as long as their positivity rates were high. However, when incidence was very low, reliable positivity rates could not be determined from our National Field Test. Therefore, to determine the positivity rates of infrequent triggers, we conducted automated searches of all patient records in the EHR at an academic tertiary care children’s hospital from 2007 to 2012 for these triggers (n = 91 458 records) after completion of our National Field Test. Selecting 10 random records with each of these triggers (or <10 records if the trigger was found <10 times in the six-year time frame), we subsequently conducted 2-stage reviews of flagged records to determine positivity rates.
Expert Review Process
Fifty-one of 78 candidate triggers (65%) received a score from the GAPPS Expert Stakeholder Panel of ≥7 for both validity and detection feasibility and thus met prespecified criteria for inclusion in GAPPS. With an additional round of panel voting, we added 3 triggers that had initially not met inclusion criteria, but were reconsidered based on subsequently released PACHMT study findings that suggested these triggers were of value.7 The final draft of the GAPPS tool tested in the National Field Test included 54 triggers (Table 1).
National Field Test
Sixteen hospitals participated in the GAPPS National Field Test (see Acknowledgments). Hospital-based review teams completed 3814 of 3840 (99.3%) planned record reviews across study hospitals. Patient demographics are reported in Table 2. For reliability and validity calculations, we conducted additional internal reviews on 379 records and external reviews on 375 records across all sites.
Primary reviewers agreed that a record did or did not contain a suspected AE 92% of the time (κ = 0.69). Hospital-based secondary reviewers agreed on final judgments about the presence or absence of an AE 92% of the time (κ = 0.81). Detailed agreement data on the reliability of within-hospital reviewers and hospital-based versus external expert reviewers are presented in Figs 2 and 3. Using the external expert reviewers’ findings as the standard of comparison, hospital-based primary reviewers had a sensitivity and specificity of 40% and 91%, respectively.
As reviewers gained experience during the study, primary reviewers’ agreement with expert reviewers increased. In particular, agreement on AE identification improved from 78% (κ = 0.27) for the first one-third of records reviewed to 84% (κ = 0.51) for the last one-third, and primary reviewers’ reported number of AEs became significantly more reliable (κ = 0.21 vs 0.51, P = .05). Similarly, the level of agreement between experienced hospital-based primary reviewers (ie, those who had participated in a previous trigger tool study) and expert reviewers was higher (83% [κ = 0.53]) than agreement between inexperienced hospital-based primary reviewers and expert reviewers (69% [κ = 0.24]). Experienced hospital-based primary reviewers’ conclusions on the number of AEs identified were significantly more reliable than inexperienced hospital-based primary reviewers’ (κ = 0.50 vs 0.15, P = .03) (Fig 4).
Twenty-seven of 54 candidate triggers (50%) were retained in the final manual trigger list based on meeting positivity and frequency criteria (n = 24/54 candidate triggers [44%]), or identifying AEs that were preventable and severe (n = 3/54 candidate triggers [6%]).
Thirty of 54 candidate triggers (59%) were retained in the final automated list. Thirty-seven of 54 (69%) had sufficient positivity rates (≥10% of triggers indicated a true AE) for inclusion in the automated trigger list. However, because it was not possible to automate certain triggers (eg, unplanned endotracheal extubation) in our academic center’s EHR (7/54 [13%]), we removed these triggers from the automated list.
In a national, multicenter study, we found that GAPPS, a rigorously developed pediatric global (ie, hospital-wide) trigger tool, reliably detected AEs among hospitalized children; at the primary review stage, hospital-based reviewers conducting independent reviews agreed that a suspected AE was present 92% of the time (κ = 0.69), and secondary physician reviewers agreed on the presence or absence of an AE 92% of the time (κ = 0.81). Because tool usage within hospitals was reliable, GAPPS can be used as a means of tracking hospital AE rates over time and assessing the effects of quality improvement efforts. Although previous pediatric trigger tools have been developed, our study builds substantially on these by evaluating interrater reliability, at both primary and secondary review stages, by determining the sensitivity and specificity of the tool using external expert reviewers as a criterion standard and by developing and testing a trigger tool that can be used in both academic and community pediatric settings, across general pediatric, surgical, subspecialty, and ICUs.
Previous adult studies have demonstrated that trigger tools, especially the IHI GTT, perform well compared with other means of detecting harm.3,11 The IHI GTT has been used in 1 pediatric hospital and performed fairly well, but refinements of that tool for pediatrics were suggested to optimize performance.23 Our work builds on the IHI GTT as well as multiple pediatric trigger tools.7–10 To our knowledge, no previous adult or pediatric study has conducted the detailed trigger-by-trigger analysis we conducted, leading to a robustly scrutinized set of manual and automatable triggers. This process and the resulting components of GAPPS may be informative in future efforts to refine adult, as well as pediatric, tools.
Reliable measurement of AEs is critical for institutions and providers to determine the effectiveness of efforts to improve pediatric patient safety.24,25 Voluntary reporting remains the mainstay of patient safety tracking in most institutions, despite a wealth of data demonstrating its inadequacy.26,27 Administrative screening tools, although an improvement over voluntary reporting, are likewise relatively insensitive and imprecise.6 Augmenting voluntary reporting with systematic surveillance using GAPPS may improve understanding about safety vulnerabilities. We found that GAPPS had similar reliability in pediatric populations as the IHI GTT has in adult populations.6,13
Although far more sensitive and reliable than other methods, manual review of records is time and labor intensive. Anticipating that additional refinements of GAPPS would be desirable, we refined our tool by removing those triggers that indicated the presence of an AE infrequently (<10% positivity rate). In addition, for our manual trigger list, almost all triggers that occurred in <10 events in 3814 records were removed unless they identified severe and preventable AEs. These refinements make GAPPS easier to use, with little loss of sensitivity, and will likely streamline the review process. Of note, although we removed very low positivity triggers, we recognize that 10% is still a relatively low positivity rate for retention in our final tool. We intentionally set this threshold low to minimize the number of missed adverse events. Although increasing this threshold would further decrease the time burden of review, it would have the undesired impact of more missed AEs.
The majority of our triggers were amenable to automation, and we consequently developed an automated version of GAPPS. This will likely greatly decrease the amount of labor required to use trigger tools. We anticipate that the GAPPS automated trigger list will be expanded over time as the ability to automate complex triggers in EHRs improves. Future research should further explore automated versus manual trigger detection.
Our study has several limitations. First, we studied patient safety in a nonrandom sample of the Pediatric Research in Inpatient Settings network hospitals. Although many had no previous experience with trigger tool methodology, they may have been better equipped or more motivated to carry out a patient safety surveillance study than institutions that did not volunteer. Secondly, although trigger tools detect AEs at higher rates than administrative tools and incident reports, any record review methodology is limited to the information provided in the record; record keeping may be incomplete or inaccessible both in paper and electronic records. Direct observation has been shown to detect incidents at a higher rate.28,29 In addition, although previous work indicates that most AEs can be captured in 20 minutes,5,13 some AEs will not be captured even with our 30-minute cap, particularly for patients with complex illnesses and long hospital stays; designing a functional trigger tool requires striking a balance between sensitivity and usability. As the ability to use electronic trigger increases within EHR systems, these limitations should be diminished. Lastly, although within-hospital interrater reliability was high, newly trained reviewers and inexperienced institutions detected fewer AEs than those with more experience; our training regimen was likely insufficient to promote rapid acquisition of expertise among reviewers, and inexperienced reviewers undoubtedly missed some AEs. To meet this need, we have developed additional video-based training modules for reviewers to provide more extensive case-based training, tips, and opportunities to practice skills before beginning formal reviews. Further efforts will be needed to ensure the consistent application of GAPPS across sites, before its use as a benchmarking measure.
The GAPPS trigger tool reliably measures AE rates within institutions. Trigger tools represent a substantial advance over passive voluntary reporting systems because they are far more sensitive and consistent. With tool automation and further training, we anticipate that the feasibility, reliability, and validity of GAPPS will further improve, opening the door to its use as a means of tracking and comparing safety and safety initiatives across institutions. Better understanding and measurement of patient safety will be essential in efforts to effectively address the worldwide epidemic of harm due to medical care.
We thank the members of the GAPPS Study Group:
Members of the Expert Panel: David Bundy, MD, MPH, S. Todd Callahan, MD, MPH, Emi Datuin-Pal, RN, BSN, MSHSA, MBA, Carol Haraden, PhD, Laura Knobel, MD, Rita Pickler, PhD, RN, PNP-BC, FAAN, Xavier Sevilla, MD, MBA, Jennifer Slayton, MSN, RN, and Glenn Takata, MD, MS.
External expert reviewers who participated in the GAPPS National Field Test: Lee M. Adler, DO, Kathleen M. Haig, RN, Diedre A. Rahn, RN, Roger K. Resar, MD, and Katherine R. Zigmont, RN.
Members of the GAPPS Advisory Committee: Hema Bisarya, MHSA, RD, David C. Classen, MD, MS, and Rajendu Srivastava, MD, MPH
Reviewers and leads that participated in the study: Anjum Ahmed, Francisco Alvarez, Nicole Anania, Jennifer Bates, Judy Black, Deb Bracken, Lindsey Burghardt, Andrew Chu, Shelley Collins, Shannon Cottreau, Kristen Critelli, Carrie Cuomo, Lynn D’Angelo, Juanita Fox, Deborah Franzon, Julie Harkness, Krisa Hoyle, Rasa Izadnegahdar, Rebecca Jennings, Sheri Keahey, Jeremy Kern, Alisa Khan, Marie King, Eric Kirkendall, Laurie Kohring, Jillian Konarski, Jeffrey Lancaster, Chris Landrigan, Fran Laube, Valere Lemon, Sarah Lenhardt, Kristen Lewis, Michele Lossius, Colleen Madden, Allison Markowsky, Tara Matthews, Beth Matucci, Rachna May, Kim Medlin, Teresa Miller, Jennie Ono, Russ Osguthorpe, Nena Osorio, Tua Palangyo, Kavita Parikh, Kamakshya Patra, Kathy Peca, Serena Phillips, Denise Pickard, Cassandra Pruitt, Jennifer Rackley, Rob Riley, Lauren Rohrs, Theresa Sawyer, Kathy Shafer, Paul Sharek, Karen Singson, David Stockwell, Dawn Spell, Eric Tham, Amy Tyler, Kate Walsh, Mark Waltzman, Marin Waynar, Faye Weir, Martha Williams, Nancy Young,
The GAPPS National Field Test sites: Boston Children’s Hospital, Children’s Hospital Colorado, Children’s National Medical Center, Cincinnati Children’s Hospital Medical Center, Grand View Hospital, Mary Washington Hospital, Lucile Packard Children’s Hospital Stanford, Providence St. Peter Hospital, Progress West Hospital, University of Florida Health Shands Children’s Hospital, Silver Cross Hospital, New York Presbyterian/Weill Cornell Medical Center, Utah Valley Regional Medical Center, Western Virginia University Hospitals, Hillcrest Hospital, and South Shore Hospital.
- Accepted March 24, 2016.
- Address correspondence to Christopher P. Landrigan, MD, MPH, Division of General Pediatrics, Boston Children’s Hospital 300 Longwood Ave, Boston, MA 02115. E-mail:
FINANCIAL DISCLOSURES: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: This project was supported by grant U18HS020513 from the Agency for Healthcare Research and Quality. The content is solely the responsibility of the authors and does not necessarily represent the official views of the Agency for Healthcare Research and Quality.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
- Kohn LT,
- Corrigan JM,
- Donaldson MS
- Classen DC,
- Resar R,
- Griffin F, et al
- Resar RK,
- Rozich JD,
- Classen D
- Griffin FA,
- Resar RK
- Levinson DR
- Stockwell DC,
- Bisarya H,
- Classen DC, et al
- Matlow AG,
- Cronin CMG,
- Flintoft V, et al
- Sharek PJ,
- Horbar JD,
- Mason W, et al
- Takata GS,
- Mason W,
- Taketomo C,
- Logsdon T,
- Sharek PJ
- Griffin FA,
- Classen DC
- Schuster MA,
- Asch SM,
- McGlynn EA,
- Kerr EA,
- Hardy AM,
- Gifford DS
- Brook R
- ↵Hospital Quick Reports | Aggregate Reports for U.S. Hospitals & Health Care Systems | AHA Data Online. Available at: www.ahadataviewer.com/quickreport/. Accessed July 24, 2014
- ↵NCC MERP Taxonomy of Medication Errors. Available at: http://www.nccmerp.org/sites/default/files/taxonomy2001-07-31.pdf. Accessed September 28, 2015
- Healthcare Cost and Utilization Project (HCUP). Chronic Condition Indicator (CCI) for ICD-9-CM. Available at: www.hcup-us.ahrq.gov/toolssoftware/chronic/chronic.jsp. Accessed August 17, 2015
- Vincent C,
- Aylin P,
- Franklin BD, et al
- Sari AB-A,
- Sheldon TA,
- Cracknell A,
- Turnbull A
- Copyright © 2016 by the American Academy of Pediatrics