BACKGROUND AND OBJECTIVES: Guideline recommendations for the same clinical condition may vary. The purpose of this study was to determine the degree of agreement among comparable asthma and bronchiolitis treatment recommendations from guidelines.
METHODS: National and international guidelines were searched by using guideline databases (eg, National Guidelines Clearinghouse: December 16–17, 2014, and January 9, 2015). Guideline recommendations were categorized as (1) recommend, (2) optionally recommend, (3) abstain from recommending, (4) recommend against a treatment, and (5) not addressed by the guideline. The degree of agreement between recommendations was evaluated by using an unweighted and weighted κ score. Pairwise comparisons of the guidelines were evaluated similarly.
RESULTS: There were 7 guidelines for asthma and 4 guidelines for bronchiolitis. For asthma, there were 166 recommendation topics, with 69 recommendation topics given in ≥2 guidelines. For bronchiolitis, there were 46 recommendation topics, with 21 recommendation topics provided in ≥2 guidelines. The overall κ for asthma was 0.03, both unweighted (95% confidence interval [CI]: −0.01 to 0.07) and weighted (95% CI: −0.01 to 0.10); for bronchiolitis, it was 0.32 unweighted (95% CI: 0.16 to 0.52) and 0.15 weighted (95% CI: −0.01 to 0.5).
CONCLUSIONS: Less agreement was found in national and international guidelines for asthma than for bronchiolitis. Additional studies are needed to determine if differences are based on patient preferences and values and economic considerations or if other recommendation-level, guideline-level, and condition-level factors are driving these differences.
- AAH —
- Australian Asthma Handbook
- AAP —
- American Academy of Pediatrics
- ACCP —
- American College of Chest Physicians
- AGREE II —
- Appraisal of Guidelines Research and Evaluation II
- CI —
- confidence interval
- CPS —
- Canadian Paediatric Society
- CTS —
- Canadian Thoracic Society
- GINA —
- Global Initiative for Asthma
- NHLBI —
- National Heart, Lung, and Blood Institute
- SIGN —
- Scottish Intercollegiate Guidelines Network
- SNHS —
- Spanish National Health System
What’s Known on This Subject:
Clinical practice guidelines are used to influence the provider’s care of patients. Implementation of high-quality guidelines can improve care. There have been anecdotal reports of differences between guidelines written on the same condition, but this has never been quantified.
What This Study Adds:
This is the first attempt to quantify the differences between guideline recommendations for the same condition. Overall, there was less agreement between guideline recommendations for asthma than for bronchiolitis.
The creators of guidelines attempt to refine clinical questions and balance the trade-offs of the benefits versus risks of an intervention and its alternatives to influence a clinician’s care of a patient.1 The implementation of clinical practice guidelines can promote high-value care by improving outcomes and reducing costs.2,3 For example, an appropriate decline in the unnecessary use of chest radiographs, steroids, and bronchodilators was observed after the 2006 American Academy of Pediatrics (AAP) bronchiolitis guideline publication.4 However, the authors of a number of studies have demonstrated that differences occur across clinical practice guidelines developed for the same condition.5–8 These result from differences in guideline development, reporting, methodological quality, and content.9–14 These discrepancies can cause confusion about the best treatment for the patient, and naivety about the underlying reason for such differences could lead clinicians to inaccurately apply these recommendations in practice.9 A common means of comparing guidelines is by using quality ratings like the Appraisal of Guidelines Research and Evaluation II (AGREE II).15 However, little is known about potential guideline treatment recommendation agreement among common prevalent pediatric conditions.
Asthma and bronchiolitis are among the most prevalent and costly pediatric medical conditions requiring hospitalization; accordingly, these conditions have been identified as high priorities for research because of their prevalence and cost.16 The objective of this study was to assess the concordance of recommendations for these conditions. Specifically, we aimed to assess the degree of agreement among similar treatment recommendations across different national and international guidelines for asthma and bronchiolitis. We hypothesized that there would be a high level of agreement among similar treatment recommendations across these guidelines.
Information Sources and Search Strategy
We performed a literature search to find guidelines for asthma and bronchiolitis by using 4 large guideline databases: the Guidelines International Network, the National Guidelines Clearinghouse, the Canadian Agency for Drugs and Technologies in Health Grey Matters, and the Trip database.17–20 This gray literature search was conducted from December 16 to 17, 2014, (asthma) and on January 9, 2015, (bronchiolitis). Duplicates were removed and the primary author (L.A.B.) screened titles for relevant guidelines.
Guidelines for the treatment of asthma and bronchiolitis published within the last 12 years (January 2003–January 2015) from the 34 countries currently participating in the Organization for Economic Cooperation and Development were included.21 Guideline eligibility criteria are shown in Table 1.
Data Collection, Extraction, and Organization
Extracted data included the recommendation, guideline, disease, the primary outcome of treatment recommendation, and the AGREE II instrument rating to assess guideline quality and reporting, country of origin, and year of guideline publication.
Three authors (L.A.B., J.E., K.L.) independently extracted the data by using structured data collection forms. First, a single guideline was reviewed and scored by all 3 authors, and all discrepancies among the 3 authors were resolved through discussion. Second, all subsequent guidelines were reviewed and data were extracted from them by 2 authors independently. Differences in data extractions were discussed and, if necessary, a third author was used for arbitration. All fields were discussed for unanimous agreement, with the exception of guideline scoring using the AGREE II instrument to assess guideline quality and reporting. This tool has 2 overall guideline assessments and 23 individual questions that fall within 6 different domains: scope and purpose, stakeholder involvement, rigor of development, clarity of presentation, applicability, and editorial independence.15 Discrepancies of >2 points for items on AGREE II were re-reviewed collectively by 2 authors.
In each guideline, recommendations focused on treatment were identified. Excluded recommendations were on assessment, emergency referral criteria, presentation, diagnostic testing, follow-up, prophylaxis, prevention, and education. Only the key recommendations were included, as described by the AGREE II instrument.15
The primary outcome was treatment recommendation. For each guideline, treatment recommendations were categorized as (1) recommend for: recommendation in favor of an intervention; (2) optional: the intervention was an option; (3) abstain: no recommendation either for or against an intervention; (4) recommend against: a particular treatment was not recommended; or (5) not addressed: the guideline did not specifically address whether to recommend an intervention. The recommendation designation systems that informed this work were the AAP’s policy statement on classifying recommendations and the Grading of Recommendations Assessment, Development and Evaluation system.22,23 Though the instrument was not validated, 2 team members gave the treatment recommendation designation and were checked for consistency.
After collection of the data items, the key recommendations for each guideline were organized by topic to allow comparison of the recommendations among guidelines.
We summarized the overall number of recommendations made for each condition, the frequency of each of the categories of the primary outcome for asthma and bronchiolitis, and the number of recommendations that were not addressed for asthma and bronchiolitis. We compared the guidelines on their quality by using the AGREE II tool. For reference, the evidence strength was reported when available for the key recommendations for the US guidelines for asthma24 and bronchiolitis.23
We used Cohen’s κ statistic to assess agreement among similar recommendations.25 We used both unweighted and weighted κ in cases in which the primary outcome was treated as categorical and ordinal, respectively. A weighted κ score is different from a standard unweighted κ score in that it allows weighting of differing categories with varying gravity to take into account the magnitude of disagreement present. Analysis was performed by using the R statistical software (www.r-project.org) (R Development Core Team, R Foundation for Statistical Computing, Vienna, Austria).26 We calculated a pairwise κ between guidelines as well as an overall κ score for all recommendations among all the available guidelines. κ scores were categorized as indicating poor agreement (<0), slight agreement (0–0.2), fair agreement (0.21–0.4), moderate agreement (0.41–0.6), substantial agreement (0.61–0.8), or almost perfect agreement (0.81–1.0).27 Confidence intervals (CIs) were determined by bootstrapping (n = 1000).
Sensitivity analyses were conducted with alternate interpretation of the absence of a reported recommendation. First, recommendations originally categorized as not addressed were recoded and analyzed as “missing data.” Second, recommendations originally categorized as not addressed were recategorized as abstain.
This study was considered exempt by the research ethics boards of the Hospital for Sick Children and the University of Toronto, Toronto, Ontario, Canada.
Of 1381 citations, 473 were duplicates. After initial screening of titles and abstracts, 125 documents were identified for full-text review, and 118 were excluded (Fig 1). Seven asthma guidelines were identified.
There were 166 recommendation topics, with 69 recommendation topics provided in ≥2 guidelines (Table 2). The mean (SD) number of recommendations per guideline was 28 (16.3). The National Heart, Lung, and Blood Institute (NHLBI) asthma guideline contained the most recommendation topics in common with other guidelines, totaling 44 recommendation topics. The American College of Chest Physicians (ACCP) guideline contained the fewest, with only 5 recommendation topics. There was a mean (SD) of 40.6 (16.5) not addressed recommendation topics per guideline.
The AGREE II overall quality score (total score of 7) ranged from 3 to 6 points. The Scottish Intercollegiate Guidelines Network (SIGN) and Canadian Thoracic Society (CTS) guidelines had the best overall AGREE II score of 6, and the Canadian Paediatric Society (CPS) guideline had the lowest score of 3 (Table 2).
The overall unweighted and weighted κ scores were both 0.03 (95% CIs: −0.01 to 0.07 and −0.01 to 0.10, respectively). Both scores signify only slight agreement (Table 3). The agreement between guideline pairs was poor (Australian Asthma Handbook (AAH) and CTS, unweighted κ score: −0.15 [95% CI: −0.28 to −0.02]; weighted κ score: −0.2 [95% CI: −0.04 to −0.01]) to fair (AAH and CPS, unweighted κ score: 0.18 [95% CI: 0.07 to 0.29]; weighted κ score: 0.24 [95% CI: 0.1 to 0.39]).
Sensitivity Analysis 1
When recommendations originally categorized as not addressed were recoded and analyzed as missing data, the κ analysis (both overall and paired) could not be completed because of the large number of missing values. The key recommendation topics from the 7 guidelines on asthma were discrepant. In 31 instances, only 2 guidelines contained recommendations that could be compared.
Sensitivity Analysis 2
When recommendations originally categorized as not addressed were recategorized as abstain, overall, the weighted and unweighted κ scores showed slight agreement (overall unweighted κ: 0.04 [95% CI: 0 to 0.08]; overall weighted κ: 0.12 (95% CI: 0.06 to 0.19]) (Supplemental Table 6). The pairwise agreement ranged from poor to fair, similarly to the primary analysis, although some differences in guideline pairs were noted.
Of 322 citations, 65 were duplicates. After initial screening of titles and abstracts, 13 documents were identified and the full texts were obtained. Nine guidelines were excluded after obtaining the full texts (Fig 2). Four bronchiolitis guidelines were identified.
There were 46 recommendation topics, with 21 recommendation topics provided in ≥2 guidelines (Table 4). The mean (SD) number of recommendations per guideline was 15 (2.7). The SIGN bronchiolitis guideline contained the fewest recommendation topics in common with other guidelines, totaling 13 recommendation topics. The recommendations included in the Spanish National Health System (SNHS) guideline were all addressed in other guidelines as well. There was a mean (SD) of 6 (2.7) not addressed recommendation topics per guideline.
The overall AGREE II quality score (total score of 7) ranged from 2 to 6. The best AGREE II score was the SNHS guideline score of 6, and the CPS guideline received the lowest score of 2 (Table 4).
The overall unweighted κ score for the bronchiolitis treatment recommendations demonstrated fair agreement (0.32 [95% CI: 0.16 to 0.52]), and the overall weighted κ score signified slight agreement (0.15 [95% CI: −0.01 to 0.5]) (Table 5). There was slight agreement (SIGN and the CPS, unweighted κ score: 0.1 [95% CI: −0.17 to 0.36]; weighted κ score: −0.35 [95% CI: −0.79 to 0.09]) to moderate agreement (AAP and CPS, unweighted κ score: 0.61 [95% CI: 0.35 to 0.87]; weighted κ score: 0.39 [95% CI: −0.09 to 0.87]; SNHS and SIGN, weighted κ score: 0.39 [95% CI: 0.02 to 0.75]) between guideline pairs.
Sensitivity Analysis 1
When recommendations originally categorized as not addressed were recoded and analyzed as missing data, this substantially changed the κ scores (Supplemental Table 7). The overall unweighted κ score indicated substantial agreement (0.75 [95% CI: 0.53 to 0.94]) and the weighted κ score was almost perfect (0.92 [95% CI: 0.82 to 0.99]). For the pairwise comparison, both unweighted and weighted κ scores were between moderate and almost perfect.
Sensitivity Analysis 2
When recommendations originally categorized as not addressed were recategorized as abstain, overall unweighted and weighted κ scores indicated fair (0.34 [95% CI: 0.17 to 0.51]) to substantial (0.61 [95% CI: 0.44 to 0.78]) agreement and the pairs’ agreement demonstrated slight to substantial agreement for unweighted and weighted κ scores (Supplemental Table 8). These results mirrored the results obtained from the primary analysis and may better estimate the true overall κ scores and pairwise agreement, specifically for the weighted scores that are dependent on the ordinal nature of the scale.
This is the first report in the literature in which quantitative methods are used to compare clinical practice guideline treatment recommendations among different national and international guidelines. Focusing on highly prevalent pediatric conditions cared for by pediatricians, we found less agreement than anticipated among national and international guidelines for asthma (Table 2) than for bronchiolitis (Table 4). This is likely because of the large number of not addressed recommendations among the asthma guidelines. In addition, there was a substantial difference in the κ scores when recommendations were categorized as not addressed and when they were considered missing data for both asthma and bronchiolitis. When analyzed in this manner, those recommendations that were not addressed and then recoded as missing data were not accounted for in this analysis, leading to fewer overall comparisons. When the comparisons with the not addressed category were removed from the analysis for bronchiolitis, agreement becomes nearly perfect; however, this may falsely overstate the agreement between guidelines.
Additionally, the difference could also be attributed to the type of treatment recommendations that are being put forth. It may be easier to agree on nonintervention recommendations that are common for bronchiolitis than on recommendations for an appropriate intervention, as often occurs for asthma.
The authors of numerous studies have compared guidelines for the same condition by using qualitative and descriptive analyses.7,10,11,14,28–31 When guidelines have been quantitatively compared in the literature, the main comparison is focused on guideline quality as assessed by using the AGREE II instrument.6,8,32,33 We too found variation in guideline quality by using the AGREE II instrument. We have not found any previous reports in which the differences in agreement across guidelines for the same condition are quantified.
The differences reported in this study are clinically important. In Table 2, it is apparent that there are occasions when the asthma guidelines do agree. However, there are also many instances in which they do not. There are 4 examples in which 1 guideline recommended a treatment and another recommended against the same treatment of asthma. One such example is the use of a leukotriene receptor antagonist (montelukast) for children over 2 years as the first choice of controller therapy for mild persistent asthma. The Global Initiative for Asthma (GINA) recommends against the practice, whereas it was optional in the AAH and NHLBI guidelines and recommended for use in the SIGN guideline. Though there is less variation in the bronchiolitis guidelines (Table 4), there are still discrepancies, although none were as stark as 1 guideline recommending a treatment and another recommending against it. For example, a montelukast use recommendation was abstained from in the SIGN guideline, but montelukast use was not recommended in the SNHS guideline. Overall, these differences can make the treatment of patients confusing when a provider is trying to follow evidence-based clinical practices guidelines.
The AAP develops clinical practice guidelines independently as well as through collaborations with other societies. It also endorses guidelines from other organizations.22 Although the AAP developed its own system for evaluating the evidence and providing recommendations,22 evidence is often insufficient, leaving the AAP guideline panel to make recommendations on the basis of little evidence.34 It will be important for the AAP to be cognizant of the differences between their endorsed guidelines and those of other national and international pediatric societies. This also reveals the continued variability in quality of guidelines in the United States and in other countries that aim to bring the best clinical care through clinical practice guidelines to pediatric patients.34
There are limitations to this study. First, there was a lack of pairwise comparisons, particularly for asthma. This issue has been termed the “Kappa Paradox” in the statistical literature.35 There were a large number of recommendations that were categorized as not addressed. For example, the CPS guideline had few key recommendations outlined, leading to many recommendation topics being categorized as not addressed. The ACCP guideline had few recommendation topics in common with the other guidelines; this guideline was focused largely on inhalation devices and delivery, leading to many recommendation topics being categorized as not addressed. This lack of data is the most plausible explanation for the differences that we identified in the analysis when the category not addressed was recoded to missing data, leading to fewer available comparisons. Second, although this study was an attempt to quantify the differences between guideline treatment recommendations that previously have been compared qualitatively, this method may not be sensitive enough to the subtle semantics of recommendations as they are written. As a team, we had many discussions about the differences in language for recommend for, optional, abstain, and recommend against. This categorization may not address the subtleties of language that explain in explicit detail the differences between guidelines. For instance, there is a 22-page chapter dedicated to explaining the differences in the current US and European asthma guidelines.30 Third, there were several limitations to the search and retrieval of guidelines for this study. First, we limited our search to a structured gray literature search that was not peer reviewed by a librarian. However, in comparison with a Medline, Embase, and similar but more limited gray literature search peer reviewed by a librarian for another unpublished study, there were few differences in the number of guidelines retrieved for bronchiolitis and asthma. For bronchiolitis, there were 2 additional guidelines retrieved by this structured gray literature search, a more recently published guideline and an update since the last search was completed. For asthma, the same guidelines were found. Fourth, we chose to limit our search to those in English. This may have unnecessarily narrowed our search, and we may have had a more comprehensive list of international guidelines if we had translated those guidelines in other languages. However, there were only 4 guidelines for asthma that were excluded because of non-English language status. None of the bronchiolitis guidelines found were non-English. This may limit our generalizability of this process for guideline appraisal and comparison with those countries that are non-English speaking, though standardization of the guideline content and quality has been part of a worldwide discussion with the AGREE II tool, which is available and translated into 32 languages.36 Finally, we limited our study to treatment recommendations. Findings may have differed if we had considered other recommendations focused on assessment, emergency referral criteria, presentation, diagnostic testing, follow-up, prophylaxis, prevention, and/or education.
The discrepancies found in agreement between guideline recommendations in common pediatric conditions cared for by a pediatrician or pediatric hospitalist among national and international guidelines is concerning. There is substantial variability in treatment recommendation guidelines among national and international guidelines for asthma and some variation for bronchiolitis. There is variation in guideline development methods across the world. There were over 60 different evidence evaluation and recommendation grading systems in use when last evaluated in 2012, making the interpretation of guidelines more difficult.23,37 Clinical practice guideline panels may benefit from adapting existing evidence synthesis and clinical practice guidelines to their local context rather than from de novo development of evidence synthesis to create a new guideline.8 Clarity and transparency in clinical practice guideline work would improve if there were more collaborative international work in clinical practice guideline development, or, at the least, more fidelity to a standard reporting structure.
Overall κ analysis revealed slight agreement for asthma and fair agreement for bronchiolitis guidelines. This suggests that there is variability in treatment recommendation guidelines among national and international guidelines for asthma and bronchiolitis.
- Accepted August 16, 2017.
- Address correspondence to Leigh Anne Bakel, MD, Section of Hospital Medicine, Department of Pediatrics, Children’s Hospital Colorado, 13123 E 16th Ave B302, Aurora, CO 80045. E-mail:
FINANCIAL DISCLOSURE: Dr Straus is funded by a Tier 1 Canada Research Chair in Knowledge Translation; the other authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: No external funding.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
- Lenzer J,
- Hoffman JR,
- Furberg CD,
- Ioannidis JP; Guideline Panel Review Working Group
- Parikh K,
- Hall M,
- Mittal V, et al
- Adhyaru BB,
- Jacobson TA
- Burgers JS
- Burgers JS,
- Bailey JV,
- Klazinga NS,
- Van Der Bij AK,
- Grol R,
- Feder G; AGREE Collaboration
- National Guideline Clearinghouse
- Guidelines International Network
- Canadian Agency for Drugs and Technologies in Health
- Brassey J
- Organization for Economic Cooperation and Development
- American Academy of Pediatrics Steering Committee on Quality Improvement and Management
- Jadad AR,
- Moher M,
- Browman GP, et al
- R Core Team
- Matthys J,
- De Meyere M,
- van Driel ML,
- De Sutter A
- Guillén Ú,
- Weiss EM,
- Munson D, et al
- Hester G,
- Nelson K,
- Mahant S,
- Eresuma E,
- Keren R,
- Srivastava R
- Woods CR
- Bai ASV,
- Bak G,
- Wells G
- Copyright © 2017 by the American Academy of Pediatrics