Minimally Important Differences in Patient or Proxy-Reported Outcome Studies Relevant to Children: A Systematic Review
CONTEXT: No study has characterized and appraised all anchor-based minimally important differences (MIDs) associated with patient-reported outcome (PRO) instruments in pediatric studies.
OBJECTIVE: To complete a comprehensive systematic survey and appraisal of published anchor-based MIDs associated with PRO instruments used in children.
DATA SOURCES: Medline, Embase, and PsycINFO (1989 to February 11, 2015).
STUDY SELECTION: Studies reporting empirical ascertainment of anchor-based MIDs among PROs used in pediatric care.
DATA EXTRACTION: All pertinent data items related to the characteristics of PRO instruments, anchors, and MIDs.
RESULTS: Of 4179 unique citations, 30 studies (including 32 cohorts) proved eligible and reported on 28 unique PROs (8 generic, 13 disease-specific, 5 symptoms-specific, 2 function-specific), with 9 (32%) classified as patient-reported, 11 (39%) proxy-reported, and 8 (29%) both patient- and proxy-reported. Of the 30 studies, we rated 14 (44%) as providing highly credible estimates of the MID. Most cohorts (n = 20, 62%) recorded patients’ direct response to the target PRO and the use of an independent standard of comparison (n = 25, 78%). Most, however, failed to effectively report measurement properties of the anchor (n = 24, 75%).
LIMITATIONS: We have not yet addressed the measurement properties of instrument to measure credibility; our search was restricted to 3 electronic sources, and we used a single data abstractor.
CONCLUSIONS: Our study found 28 PROs that have been developed for children, with fewer than half providing credible estimates. Clinicians, clinical trialists, systematic reviewers, and guideline developers seeking to effectively summarize and interpret results of studies addressing PROs in child health are likely to find our comprehensive compendium of MIDs of use, both in providing best estimates of MIDs and identifying credible estimates.
- MID —
- minimally important difference
- PC-QoL —
- Parent Cough-Specific Quality of Life Questionnaire
- PRO —
- patient-reported outcome
- VAS —
- visual analog scale
Patient-reported outcomes (PROs) provide patients’ perspectives regarding treatment benefits and harms, and are often the outcomes of greatest importance to patients. The PRO literature has grown considerably over the past 3 decades1; clinical trialists increasingly use PRO instruments as primary outcomes. PROs are also used in monitoring quality and performance in health systems, and have become a priority for research funding agencies worldwide.2,3 Although evidence supporting reliability, validity, and responsiveness exists for many PROs, interpretation of their results remains a challenge; for a given instrument, what change in a score constitutes a trivial, small but important, moderate, or large treatment effect?
The minimally important difference (MID) provides a measure of the smallest change in the PRO of interest that patients perceive as important, either beneficial or harmful.4,5 The MID can be helpful for patients, clinicians, and clinical practice guideline developers when considering the trade-off between beneficial and harmful outcomes, and for clinical trialists planning sample sizes for their studies. There are multiple methods to assess MIDs, including anchor-based methods, distribution-based methods, Delphi-methods, and scale-judgment methods with anchor- and distribution-based methods being the 2 primary approaches used to estimate MIDs.6,7 The anchor-based approach is generally considered the optimal way to determine the MID because it directly captures the patients’ preferences and values.6,7
In pediatric populations, clinicians and researchers use PRO instruments to measure symptoms, disease severity, mental health, development, functional ability, and other constructs. For children too young to answer for themselves, proxy-respondents must substitute for the patient; such measures represent the best way of assessing the patients’ subjective health state in these circumstances. No study or database has thus far systematically documented all available anchor-based MIDs associated with patient- or proxy-reported instruments for children. Given that clinical trialists, systematic review authors, and guideline panels are likely to find a compendium of trustworthy MID estimates of considerable use, we conducted a systematic survey to summarize all published anchor-based MIDs associated with PRO instruments used to evaluate the effects of interventions on chronic medical and psychiatric conditions in pediatric populations.
A previously published protocol, summarized briefly in this article, provides additional details of our methods.1
We included original reports of studies that document the development of anchor-based MIDs for PRO instruments designed for chronic medical and psychiatric conditions in pediatric populations (<18 years of age). We defined an anchor-based approach as any independent assessment to which the PRO instrument is compared, irrespective of the interpretability or the quality of the anchor. PROs of interest included self-reported patient-important outcomes of health-related quality of life, functional ability, symptom severity, and measures of psychological distress and well-being.
Although self-reported measures are likely to provide more valid anchors for generating MID estimates, proxy-respondents (parents, caregivers, or clinicians) are often called on to respond on behalf of pediatric patients, particularly when patients are too young or are incapable due to disability. We therefore included studies in which a proxy completed the PRO instrument and/or the anchor.
We excluded studies in which only the clinician completed the PRO instrument, and excluded studies reporting only distribution-based MIDs without an accompanying anchor-based MID.
Information Sources and Search
We searched Medline, Embase, and PsycINFO for studies published from 1989 to February 11, 2015, by using relevant medical subject headings. Our published protocol describes the Medline search strategy.1
Two reviewers independently screened titles and abstracts to identify potentially eligible citations. Subsequently, to determine eligibility, teams of 2 reviewers reviewed the full texts of citations identified as potentially eligible.
Data Collection, Items, and Extraction
Pairs of investigators independently extracted data by using a pilot-tested data collection form consisting of the following items: study design, description of population, interventions, outcomes, and characteristics of PRO instruments, anchors, and MID assessment (data collection forms provided in protocol manuscript1). We classified PROs as “generic” if they measured health profiles not specific to a disease state/population or symptom or function (eg, Child Health Questionnaire, Short Form-36), or “specific” if they were specific to a disease state or population (eg, Hydrocephalus Outcome Questionnaire), a particular symptom (Visual Analog Scale [VAS] for Pain), or a particular function (eg, Oxford Ankle Foot Questionnaire). One independent extractor verified all data.
Methodologists familiar with MID methods (S.E., B.C.J.) reviewed a pool of 10 eligible studies, and used standard thematic analysis techniques8 to abstract concepts related to the methodological quality of MID determinations. They reviewed coding and revised the taxonomy of methodological factors iteratively until informational redundancy and consensus were achieved. Based on this initial survey of the literature and our group’s experience with methods of ascertaining MIDs,9–16 we developed criteria for evaluating the credibility of anchor-based MID determinations. Our group has previously used such methods successfully for developing methodological quality appraisal standards across a wide range of topics.17–21
The MID credibility instrument includes 6 items, with greater credibility if (1) patients directly responded to PRO; (2) investigators used an independent standard of comparison (instead of the same instrument to assess hypothetical scenarios); (3) the anchor was interpretable for patients; (4) the anchor was interpretable to clinicians; (5) the anchor was sufficiently closely empirically related to the target PRO; and (6) the anchor measurement properties (validity and reliability) were reported and satisfactory (validity and reliability coefficients >0.5). Each of the 6 criteria were judged by using 4 response options: definitely no, not so much, to a great extent, and definitely yes. Assessors resolved disagreements by discussion, and if needed, with the study team. We present results for each criterion as high risk, consisting of definitely no and not so much, and low risk, consisting of to a great extent and definitely yes.
To summarize the credibility of the MID, we compiled the 6 criteria rated for each study as high, moderate, and low overall credibility. Studies received a high credibility rating if all criteria were met, including report of satisfactory relation to target instrument or measurement properties of anchor specified (ie, all criteria rated low risk). Studies received a moderate-credibility rating if all criteria were met with the exception of the requirements for a demonstrated satisfactory relation to target and satisfactory demonstrated measurement properties of the anchor. A low-credibility rating was where the study failed to report satisfactory measurement properties of the anchor specified and also failed to meet >1 additional criteria.
This article summarizes MID estimates, along with study design, intervention, population characteristics, characteristics of the PRO, characteristics of the anchor, and credibility ratings.
Of 4179 candidate citations, 752 were flagged as potentially eligible, of which 30 proved eligible. From the title and abstract screening, we excluded 3427 articles. Studies were excluded for not being a primary study, addressing an adult population, failing to evaluate an MID, not evaluating the MID by using an anchor-based approach, or having health status assessed by a clinician. Our final sample of 30 articles included 32 cohorts and reported on 28 unique PRO instruments; 2 articles reported the MID by using 2 separate datasets (Fig 1).22,23 The Supplemental References provides a list of included studies.
Table 1 provides a summary of study characteristics and Supplemental Table 6 presents the characteristics per study. Among the 32 cohorts, most (n = 27) were prospective observational studies, half were evaluated in autoimmune or respiratory conditions, and half in North America. All studies were published after 2000 with 9 (28%) published within the past 2 years. Sample size that was used to calculate the MID varied across studies, with a median (interquartile range) of 126 (41–281), and the mean age ranged from 2.8 to 14.6 years.
PRO Instrument Characteristics
Table 2 presents detailed characteristics of the PRO instruments. Of the 28 unique PRO instruments reported across the 32 cohorts, 8 (29%) were generic and 13 (46%) were disease-specific, 5 (18%) were symptom-specific, and 2 (7%) were function-specific. Most (n = 26, 93%) had been previously cited in existing literature; only 2 (7%), the Family Functioning Questionnaire and the Parent Cough-Specific Quality of Life Questionnaire (PC-QoL-8), being used for the first time.23,24 The number of items in each PRO varied widely, with half having >10 items. Eighteen (64%) of the PROs were measured on the nominal/ordinal scale, with 7 (25%) being measured on the ratio/interval scale and 3 (11%) including elements of both. Ten (36%) of the instruments assessed a single domain, whereas 18 (64%) assessed ≥2 domains, with the most frequently reported domain being physical function (n = 12, 31%). Nine (32%) PROs were administered solely to the pediatric patient, 11 (39%) to the proxy only, and 8 (29%) to both the patient and a proxy.
Table 3 presents detailed characteristics of the anchors. Of the 51 anchors, 32 (63%) used a single-item instrument with the remaining 13 (25%) including >1 item; 6 (12%) did not present this information. Most of the anchor response options were nominal/ordinal (n = 36, 71%), whereas 7 (14%) were ratio/interval, 2 (4%) contained components of both, and 6 (11%) did not report this information. Forty-two (82%) anchors were limited to a single domain and 9 (18%) addressed ≥2 domains. Twenty (38%) anchors were administered to the pediatric patient only, 22 (42%) to the proxy only, and 9 (17%) to both the patient and the proxy.
Table 4 provides a compendium of MIDs and the credibility of their estimates. Across the 32 cohorts, 3 studies reported >1 anchor-based method of determining the MID. In total, our review identified 35 distinct MID estimates for the 28 instruments.24–26 Methods of determining the MID included specifying a change in score or an absolute threshold on the anchor that constituted a minimum improvement or deterioration (n = 33): hypothetical scenarios (n = 2) in which response options were based on a hypothetical response to the PRO instrument (which was also used as the anchor). Of the eligible MID estimates, all but 1 reported the MID as an absolute difference or threshold score rather than as a relative change (percentage of total instrument score).27 A single article reported boy- and girl-specific estimates.28 Twenty-three (66%) of the MID estimates included a measure of precision, and 12 (34%) did not.
Our study represents the first systematic survey and appraisal of anchor-based MID estimates, and the first comprehensive compendium of such estimates for chronic medical and psychiatric conditions among children. We found 30 studies consisting of 32 cohorts of children reporting 28 unique PRO instruments; >50% were published in the past 5 years. Among the instruments identified, more than half used a PRO that was administered directly to patients; however, it was most common for proxies to respond to both the PRO instrument and the anchor. We found that more than two-thirds of questionnaires were specific instruments as opposed to generic, with nearly half being disease-specific PROs.
Our systematic survey draws attention to the methodological issues and challenges involved in MID determinations. We classified more than half of the studies as low credibility with only 1 regarded as high quality. In particular, for all but 4 instruments, investigators neglected to either report a satisfactory relation between the PRO instrument and the anchor or they did not report satisfactory measurement properties of the anchor. Although it is promising to see the increasing frequency of MID ascertained in child and adolescent populations, our findings highlight the limitations in establishing highly credible estimates. Our credibility assessment tool helps with the transparency of this issue to readers.
The interpretation or applicability of MIDs from the same instrument across 2 different populations may differ. For example, a 2001 study in our sample assessed the MID of the VAS in patients with acute pain of either traumatic or nontraumatic cause,29 whereas a 2013 study assessed the MID of the VAS in patients with sickle cell disease.30 The former ascertained an MID of +11 mm for improvement whereas the latter ascertained 9.7 mm for improvement. Readers may be confused as to which one is more accurate when they see 2 (or more) estimates or whether the difference can be explained by chance. Thus, it is crucial to provide readers with MID estimates for the population to which one intends to apply the estimates. One needs to be also cautious when interpreting these findings to similar clinical populations but where other factors, such as the region in which the study was conducted, could result in different manifestations or interpretations of the disease.
Developing an estimate of the MIDs is needed for interpreting the magnitude of improvement or deterioration of PROs relevant to children; however, the measurement and interpretation in this population poses additional challenges to that of an adult population. First, what is considered to be minimally important can vary depending on if it is the patient (child) or a proxy (eg, parent) answering the question; sometimes a proxy report may be misleading,31 whereas in other instances, a proxy report may be a reasonable estimate of patient status.32 In our study, we thus restricted proxies to include only parents/guardians. Second, previous authors have speculated that patients presenting with different baseline scores may have different ratings or interpretations of a minimal change. For example, a child with severe functional impairment or pain may require a smaller change in rating to be considered meaningful as opposed to one who has a low degree of impairment or pain. However, a recent study demonstrated that the MID, on average, does not change relative to the baseline scores.33 As well, the reader should note that Food and Drug Administration guidance does not use the term MID but rather emphasizes establishing meaningful change in PRO measures at the individual level (ie, defined as a responder) versus at the treatment group level. The Food and Drug Administration defines a responder threshold as “a score change in a measure, experienced by an individual patient over a predetermined time period that has been demonstrated in the target population to have a significant treatment benefit.”34 This guidance may have implications on different interpretations in the magnitude of treatment effects for PROs developed for child health.
Strengths and Limitations
The strengths of our study include a comprehensive and transparent search strategy and independent eligibility assessment and data extraction. We used a methodological quality appraisal method to assess MID determination across 6 criteria, allowing us to classify MIDs as high, moderate, and low credibility.
Our study has limitations, one of which is that although our credibility assessment instrument consisting of 6 criteria was developed to provide transparency to the credibility of the MID, we have not yet addressed the measurement properties of our instrument to measure credibility. The development of the instrument was, however, informed by a review of the relevant literature and our own extensive experience with the generation of MIDs.17–21 We are currently addressing the validity of the instrument as part of an ongoing larger study.
Another limitation is the restriction of our search to 3 electronic sources: Medline, Embase, and PsycINFO. There is a possibility that other databases, such as CINAHL, may have included original studies of MID estimates in children that were not indexed in the databases we searched. However, given that our focus was chronic medical and psychiatric conditions, it is likely that our selected databases effectively capture most, if not all, studies. Last, we did not conduct duplicate data abstraction and thus there is the possibility of errors in data extracted. However, to mitigate this risk, 1 reviewer checked all data and made corrections where necessary.
Our systematic presentation of all anchor-based MID estimates relevant to children will help promote informed decision-making by allowing clinical trialists, systematic review authors, guideline developers, and clinicians to better interpret the magnitude (size) of treatment effects for PROs. If, for example, we have identified a credible MID of 1.0 on an 11-point pain instrument, a mean difference of 2.0 is twice the MID and likely considered a moderate to large treatment effect. Further, our work allows knowledge users to identify anchor-based MIDs that are more or less credible. Our credibility instrument, currently in the process of being validated, also provides guidance for subsequent work in developing, further establishing, or confirming MID estimates for PROs among the pediatric population.
We thank Ms Tamsin Adams-Webber at the Hospital for Sick Children and Dr Paul Alexander for their assistance with developing the initial literature search.
- Accepted November 28, 2016.
- Address correspondence to Bradley Johnston, PhD, Systematic Overviews through advancing Research Technology (SORT), Department of Anaesthesia and Pain Medicine, The Hospital for Sick Children, University of Toronto, 686 Bay St, Toronto, Ontario, Canada M5G 0A4. E-mail:
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: This project is funded by the Canadian Institutes of Health Research, Knowledge Synthesis grant DC0190SR. Dr Ebrahim was supported by a MITACS Elevate and SickKids Restracomp Postdoctoral Fellowship Award.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
- Johnston BC,
- Ebrahim S,
- Carrasco-Labra A, et al
- Morse J
- Levine M,
- Ioannidis J,
- Haines T,
- Guyatt G
- Randolph A,
- Cook DJ,
- Guyatt G
- Furukawa T,
- Jaeschke J,
- Cook DJ,
- Jaeschke R,
- Guyatt G
- Sun X,
- Briel M,
- Walter SD,
- Guyatt GH
- Bulatović Calasan M,
- de Vries LD,
- Vastert SJ,
- Heijstek MW,
- Wulffraat NM
- ↵US Department of Health and Human Services. Guidance for industry. Patient-reported outcome measures: use in medical product development to support labeling claims. 2009 December. Available at: www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatory Information/Guidances/UCM193282.pdf. Accessed August 9, 2016
- Copyright © 2017 by the American Academy of Pediatrics