Abstract
CONTEXT: Gastroesophageal reflux (GER) is defined as GER disease (GERD) when it leads to troublesome symptoms and/or complications. We hypothesized that definitions and outcome measures in randomized controlled trials (RCTs) on pediatric GERD would be heterogeneous.
OBJECTIVES: Systematically assess definitions and outcome measures in RCTs in this population.
DATA SOURCES: Data were obtained through Cochrane, Embase, Medline, and Pubmed databases.
STUDY SELECTION: We selected English-written therapeutic RCTs concerning GERD in children 0 to 18 years old.
DATA EXTRACTION: Data were tabulated and presented descriptively. Each individual parameter or set of parameters with unique criteria for interpretation was considered a single definition for GER(D). Quality was assessed by using the Delphi score.
RESULTS: A total of 2410 unique articles were found; 46 articles were included. Twenty-six (57%) studies defined GER by using 25 different definitions and investigated 25 different interventions. GERD was defined in 21 (46%) studies, all using a unique definition and investigating a total of 23 interventions. Respectively 87 and 61 different primary outcome measures were reported by the studies in GER and GERD. Eight (17%) studies did not report on side effects. Of the remaining 38 (83%) studies that did report on side effects, 18 (47%) included this as predefined outcome measure of which 4 (22%) as a primary outcome measure. Sixteen studies (35%) were of good methodological quality.
LIMITATIONS: Only English-written studies were included.
CONCLUSIONS: Inconsistency and heterogeneity exist in definitions and outcome measures used in RCTs on pediatric GER and GERD; therefore, we recommend the development of a core outcome set.
- GER —
- gastroesophageal reflux
- GERD —
- gastroesophageal reflux disease
- pH-MII —
- pH monitoring combined with multichannel intraluminal impedance monitoring
Gastroesophageal reflux (GER) is a normal physiologic process occurring several times per day in healthy infants and children and is only referred to as GER disease (GERD) when it causes troublesome symptoms and/or complications.1–4 Claim database analysis revealed that GERD accounts for 4% of all pediatric hospital admissions in the United States and costs ∼$750 million per year.5,6 The diagnosis of GERD is currently solely based on history taking and physical examination and may be relatively easy to establish when classic symptoms, such as regurgitation, vomiting, and irritability during or after feeds are accompanied by alarm symptoms such as hematemesis or failure to thrive. However, in most cases, no alarm symptoms are present (yet) and discerning GERD from physiologic GER can be difficult, especially in infants and young children.
A well-validated diagnostic tool for GERD would thus be extremely helpful. Despite the wide availability of diagnostic tests, including pH-(impedance) monitoring, endoscopy, and empirical acid-suppressive therapy, the diagnostic accuracy of these tests in children remains unclear.7 Although pH-impedance monitoring is considered the gold standard to diagnose GERD in adults and is recommended for the evaluation of GERD and its relation to symptoms in infants and children not responding to therapy, the lack of normative pediatric ranges hampers its application as a gold standard diagnostic in children.4,8 Factors contributing to the reflux burden in children have not yet been fully elucidated, leading to a wide variety in hypotheses and potential treatment strategies and inducing the potential of overtreatment despite a lack of symptom reduction and potential occurrence of side effects.9–12
Clinical trials that aim to determine the benefits and risks of interventions should measure outcomes that are important to patients and parents, and useful to health care professionals and policymakers alike. It can be difficult, however, to determine which outcomes are most important for a given condition and a given setting. Therefore, standardization of outcome measures for randomized controlled trials (RCTs) has been proposed.13,14 Such standardized outcome measures are not available for children with GERD. A first step in this procedure is to review the definitions and outcome measures of GERD that are currently used in therapeutic RCTs. We therefore aimed to systematically review definitions and outcome measures used in therapeutic RCTs performed in infants and children with GERD. We hypothesize that these definitions and outcome measures are heterogeneous.
Methods
Search Strategy
The databases Cochrane (Central), Embase, Medline, and PubMed were searched from inception to November 2015 (full search strategy and keywords shown in Supplemental Table 7). To identify additional studies, reference lists of relevant studies identified in the literature search were searched by hand. During the whole process the exact reporting guidelines as described in the Preferred Reporting Items for Systematic Reviews and Meta-Analysis statement were followed.
Study Selection
Two investigators (M.M.J.S. and A.J.B.) independently reviewed the titles and abstracts of all citations in the literature results. Possible relevant studies were retrieved for full-text review. Therapeutic (systematic reviews of) RCTs in infants and children with GERD (age 0–18 years) were included if they were written in English and a definition of GERD was provided by the authors. Studies were excluded if the study arm was composed of <10 patients. This approach is justified by Turner et al,15 stating that if several large, high-quality studies have been found in the initial searches, searching can be truncated because the inclusion of more obscure, smaller studies would be unlikely to change conclusions of the review. Studies comparing 2 different kinds of 1 specific intervention, like dose-comparing studies, were also excluded, as we assumed that these studies would not evaluate the therapeutic effect of the intervention as their primary objective. Disagreements between reviewers were adjudicated by discussion and consensus (M.M.T.).
Data Extraction and Analysis
For each included trial, the definitions used to describe GERD and the primary outcomes regarding GERD were extracted. Data derived from included articles contained author and year of enrollment, study setting, methods, type of participants, method of GERD assessment, type of intervention, follow-up, predefined outcome measures, and results. Each individual parameter or set of parameters with unique criteria for interpretation was considered to be a single definition for GERD. Data extraction from studies in infants (age 0–12 months) was separated from the studies assessing both infants and children (ages 0–18 years).
Methodological quality of the included RCTs was assessed by using the Delphi List.16 The Delphi list was developed as a standardized list to assess the quality of RCTs. This scale ranges from 0 (minimum) to 10 (maximum). High quality was defined as a score of >6 points, an average quality as a score 4 to 6, and a low quality as a score ≤3.
Results
Search Results
The search yielded 2410 potentially relevant articles. After deducting duplicates, 1533 unique titles and abstracts were screened for eligibility and 1418 studies were excluded as they were not relevant to our search question (ie, no [systematic review of] RCT, no definition of GERD provided, or the study was of an adult population). After the evaluation of the full text, an additional 72 articles were excluded for not meeting our inclusion criteria (ie, inappropriate study design [n = 55], <10 patients per study arm [n = 16], and lack of a clear definition of GER/GERD [n = 1]). Checking the bibliographies of the systematic reviews of RCTs resulted in 3 additional RCTs (Fig 1), resulting in a total of 46 included studies. These additional studies were not identified by the original search as they did not include any of the search terms for a RCT in their title or abstract.
Prisma 2009 flow diagram. For more information, visit www.prisma-statement.org. (Reprinted with permission from Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group [2009]. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med. 6[7]:e1000097.)
Study and Patient Characteristics
In total, 2630 patients were included in 46 studies (26 studies in infants <12 months only and 20 studies in infants and children 0–18 years; there were no studies that only included children >12 months). The included studies that concerned both infants and children did not provide a breakdown based on age to describe the definitions of GERD, interventions, or outcome measures that were studied. For this reason, data regarding studies conducted in infants only and studies conducted in both infants and children are presented separately in the current study. The study characteristics are described in Supplemental Tables 8–12.
Definitions
In 26 studies, patients were included based on a definition of GER (n = 16 studies in infants only), and in 21 studies based on a definition of GERD (n = 11 studies in infants only). In one of these studies, both GER and GERD patients were included.17
GER (n = 26 Studies)
Between studies, 42 different criteria for interpretation of (combinations of) clinical and investigational parameters were used to define GER, resulting in 25 different definitions among studies (Table 1). Definitions of GER were based on clinical parameters only in 12 studies by using 12 different criteria for interpretation. GER was defined based on parameters from diagnostic investigations only in 7 studies using 6 different criteria for interpretation. The definition of GER was developed through a combination of both clinical and diagnostic parameters in 7 studies using 7 different criteria for interpretation. The majority of studies (13/16, 81%) based their definition of GER at least partly on clinical parameters, of which measures of regurgitation or vomiting frequency were most commonly used. The studies that concerned both infants and children based their definition of GER predominantly on pH-criteria (8/10, 80%), although the frequency of regurgitation or vomiting was only used in 2 (20%) of the studies. Twenty-five different interventions were assessed (Table 2; n = 11 different nonpharmacological interventions, n = 11 different pharmacological interventions, n = 3 different combinations of nonpharmacological and pharmacological interventions). Those studies that were conducted in infants only regarded both pharmacological and nonpharmacological interventions, whereas studies that included both infants and children regarded the investigation of pharmacological interventions only.
Definitions Used for GER
Interventions Used for GER
GERD (n = 21 Studies)
The 21 studies that addressed GERD all used a unique definition by using a total of 41 different criteria for interpretation of (combinations of) clinical and investigational parameters (Table 3). Definitions were based on clinical parameters only in 11 studies, on investigational parameters resulting from diagnostic investigations only in 4 studies, and on a combination of both parameters in 6 studies. All studies that were conducted in infants only used at least 1 clinical parameter to define GERD; mostly by reporting measures of regurgitation or vomiting (5/11, 45%). Three studies (27%) additionally used at least 1 additional diagnostic intervention, of which pH monitoring was most commonly used (2/11, 18%). The majority of studies (7/10, 70%) conducted in both infants and children used at least 1 diagnostic intervention to define GERD, predominantly by using pH monitoring combined with multichannel intraluminal impedance monitoring (pH-MII) characteristics (4/10, 40%). Twenty-three different interventions were assessed (Table 4; n = 4 different nonpharmacological interventions, n = 15 different pharmacological interventions, n = 4 different combinations of nonpharmacological and pharmacological interventions). All studies predominantly studied a combination of pharmacological interventions (8/11 [82%] studies in infants only; 10/10 [100%] studies in both infants and children).
Definitions Used for GERD
Interventions Used for GERD
Primary Outcome Measures
GER (n = 26 Studies)
Eighty-seven different primary outcome measures were used in the 26 studies regarding interventions for GER (Table 5). Symptoms were used as primary outcome measure in 16 studies (n = 23 different outcome measures) and parameters from diagnostic investigations were used in 21 studies (n = 62 different outcome measures). Two studies used 2 different definitions of side effects as primary outcome measure. Eighteen studies evaluated the therapeutic effect on the basis of pH-MII parameters, mostly by using the reflux index (the percentage of time that the esophageal pH <4) and reporting the total number of reflux episodes. Clinical parameters and investigational parameters were equally used as outcome measures in those studies that concerned infants only (respectively 10/16, [63%] and 11/16 [69%] studies reported at least 1 parameter). Of the clinical parameters, measures of regurgitation or vomiting were predominantly used among studies that concerned infants only (6/16, 38%). In those studies regarding both infants and children, investigational parameters were predominantly reported as outcome measures (8/10, 80%); all studies used pH-characteristics.
Primary Outcome Measures Used for GER
GERD (n = 21 Studies)
Sixty-one different primary outcome measures were used in the 21 studies regarding interventions for GERD (Table 6). Individual clinical symptoms or composed scores of clinical symptoms were used as a primary outcome measure in 17 studies, resulting in 25 different outcome measures. Parameters resulting from at least 1 diagnostic investigation were used in 9 studies, resulting in 34 different outcome measures. Two studies used 2 different definitions of side effects as primary outcome measure.
Primary Outcome Measures Used for GERD
Side Effects
Side effects were reported in 38 (83%) studies. Of these studies, 18 (47%) reported this as a predefined outcome measure, of which 4 (22%) studies included this as a primary outcome measure. In 8 (17%) studies there were no data on side effects reported. Four (9%) of these studies regarded pharmacological interventions.
Methodological Quality
The Delphi list was used to assess the methodological quality of the included RCTs (Supplemental Tables 7–16).16 Two studies (4%) had a score ≤3, indicating a low methodological quality and 16 studies (35%) were of good methodological quality (score >6). The remaining 28 studies (61%) scored between 4 and 6, indicating average methodological quality. Lack of treatment allocation, unclear, and high or unclear drop-out rates were the most common reasons for reduced methodological quality.
Discussion
This study is the first to systematically review definitions and outcome measures used in intervention trials on pediatric GERD and shows a lack of agreement on definitions, predefined outcome measures, and instruments used to evaluate GERD within these trials. We identified 46 RCTs by using 25 unique definitions of GER and 21 unique definitions of GERD. Respectively 87 and 61 different primary outcome measures were reported in studies on GER and GERD; the majority regarding individual or composed scores of clinical symptoms. The rationale for selecting outcome measures, and the measurement properties of the outcome measure tools (when used), were most often not reported.
The use of a uniform definition to describe a study population is important to obtain homogenous patient populations, allowing comparison between studies. Definitions for GERD applied in the included RCTs in this review varied widely, and none of the included trials used the exact definitions of the most recent European Society for Paediatric Gastroenterology, Hepatology, and Nutrition, North American Society for Pediatric Gastroenterology, Hepatology, and Nutrition, and National Institute for Health and Care Excellence clinical guidelines for pediatric GERD.4,64 Surprisingly, between studies, the same parameters and cutoff values for interpretation were used by authors in their definitions of both GER and GERD. This finding importantly indicates that between studies, terminology may be used interchangeably and patients with similar clinical characteristics may as well be attributed physiologic GER or pathologic GERD.
Of the 46 included trials, 26 studies were performed in infants (of which only 1 study exclusively assessed newborns age <28 days) and 20 studies were performed in both infants and children. These studies neither provided a breakdown on age regarding definitions for GERD nor for interventions and outcome measures studied. As GERD symptoms are known to vary largely by age, different age groups may involve different treatment goals and measures to evaluate treatment efficacy. In the present review, we were limited to provide an overview of those studies conducted in infants only and studies conducted in a mixed population of both infants and children. We found that studies of only infants predominantly used a symptom-based definition of GERD and were also more likely to report symptom-based outcome measures. In contrast, studies conducted in a mixed population predominantly used measures obtained from diagnostic interventions to define GERD and reported the effect of the studied interventions. Remarkably, in all these latter trials, the same definitions for GER(D) were applied to the whole study population, despite the difference in symptom presentation and clinical course of GER(D) symptoms between infants and children.65,66 Therefore, studies may inadvertently be examining a more heterogeneous population than expected. Not consistently including or standardizing presenting symptoms and complications of GERD as part of the disease definition consequently challenges the assessment of clinical symptoms and/or complications as an outcome measure. This limits generalizability and comparability of results across studies because the studies included patients with varying degrees of disease severity. Not adequately defining disease severity at the start of the study might also have a negative impact on the ability to detect change over time and success of a certain intervention. Although we did not assess the efficacy of treatment, for future studies it is important to realize that age at inclusion, as well as other factors such as prematurity and presence of comorbidity, could bias the treatment effect owing to the spontaneous improvement over time.
Most included studies did not use validated instruments to report on outcome measures, although for example in infants the Infant Gastroesophageal Reflux Questionnaire Revised has proved a reliable measure to assess symptoms over time and report on treatment outcome.67 The lack of using a validated instrument to evaluated GERD and the heterogeneity of the present outcome measures make it complicated to interpret and compare study results. None of the studies included measures of parental or patient satisfaction or quality of life as one of their primary outcome measures. Previous research has shown that perception of parents and health care professionals regarding the treatment of their infant can differ significantly.67,68 It is important to be aware of patient-related outcomes as they provide the cornerstone for family-centered care and parental satisfaction.
In a recent study, labeling an otherwise healthy infant with a GERD diagnosis increased parents’ interest in medicating their infant, even when they were told that the medications were ineffective.69 These findings suggest that a GERD label may influence parents’ judgments by changing their assumptions about what kinds of interventions are considered most appropriate. This indicates that attitudes and perceptions of parents are an important consideration for clinicians when developing patient-tailored treatment strategies.
Side effects were used as a predefined outcome measure in only less than half of the studies and 8 studies did not report on this at all, of which 4 concerned studies assessing pharmacological interventions. Additionally, study duration ranged from 6 hours to 12 weeks, inducing the potential of missing relevant long-term side effects of treatment. This finding is of great importance, as the safety of long-term use of anti-reflux medication is currently under debate. However, at the same time, previous systematic reviews of GERD treatment suggest a paucity of high-quality evidence supporting acid-suppressive treatments for this condition.12,67,70 Forty-three percent of the trials included were from before the year 2000. This may reflect the limited number of studies that used the 2009 European Society for Paediatric Gastroenterology, Hepatology, and Nutrition and North American Society for Pediatric Gastroenterology, Hepatology, and Nutrition definition of GERD, as well as the number of studies using instruments that were developed only recently, such as pH-MII measurement and the Infant Gastroesophageal Reflux Questionnaire Revised.4,71 Although studies did not necessarily use the same diagnostic techniques to both define GERD and to evaluate treatment efficacy, the large variety of diagnostic techniques may however partly explain the lack of homogeneity in the definitions and outcome measures identified in the current study.
A limitation of our review may be that we chose English as the primary language and it is possible that we could have missed RCTs published in other languages. To minimize the risk of other missed studies, we performed an extensive and sensitive literature search in collaboration with a clinical librarian. Another limitation may be that the Delphi list does not include all the items associated with the risk of bias, as it assigns weights to different items in the scale by providing an overall score per study. Additional assessment of items associated with bias would have been necessary if it was our goal to use scores for eligibility criteria or to conduct subgroup analyses. The aim of the current study was however to provide overall assessment of the quality of the included trials, rather than performing a meta-analysis evaluating the efficacy of the different interventions.16
Conclusions
Many different definitions and outcome measures are used in intervention trials in pediatric GERD. Disagreement on the choice of outcome measures impedes a direct comparison of results on the efficacy of different interventions and has resulted in inconsistent reporting and the potential for reporting bias.72,73 Changing this situation will require a better understanding of what is normal and abnormal, which currently is hampered by the lack of a gold standard diagnostic tool.4,7
There has been an increased awareness of the factors that influence the quality of clinical trials in general and those in reflux disease in particular.74–76 Standardization of both definitions and outcomes in RCTs has been proposed as a solution to the problems of inappropriate and nonuniform outcome selection and reporting bias.13,14 As GERD is a symptom-driven disease, the primary outcome measures may well include the improvement of the cardinal symptom(s); either disappearance of a single symptom or its persistence at no more than a mild severity. As GERD symptoms are known to vary widely by age and especially in infants, are often nonspecific and tend to disappear spontaneously with increasing age, a consensus of the definition of GERD in different age groups needs to be established first.4,77,78 The term “troublesome” as used by the current clinical guidelines recognizes the variability in how symptoms impact on individual patients and may well be used for this purpose.76 GERD treatment includes both pharmacological and nonpharmacological interventions, which may be targeted to treat different signs and symptoms accordingly. In addition to establishing a minimum core outcome set, establishing sets of proposed secondary outcome measures, depending on the object of the study as well as on the study population may well be appropriate. Therefore, to allow comparison between future studies, as a first step we recommend the development of both an infant- and a child-tailored minimum core outcome set for clinical research in GERD by using the Delphi technique and early involvement of stakeholders.14 Embedding these core outcome sets within future clinical trials, systematic reviews, and clinical practice guidelines on pediatric GER(D) could make a profound contribution by advancing the usefulness of research to inform clinical practice, enhance patient care, and improve clinical outcomes.
Footnotes
- Accepted April 24, 2017.
- Address correspondence to M.M.J. Singendonk, MD, Department of Pediatric Gastroenterology and Nutrition, Emma Children’s Hospital AMC, C2-312, PO Box 22700, 1100 DD Amsterdam, Netherlands. E-mail: m.m.j.singendonk{at}amc.uva.nl
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: No external funding.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
References
- Copyright © 2017 by the American Academy of Pediatrics