BACKGROUND AND OBJECTIVES: The Faces Pain Scale–Revised (FPS-R) and Color Analog Scale (CAS) are self-report pain scales commonly used in children but insufficiently validated in the emergency department setting. Our objectives were to determine the psychometric properties (convergent validity, discriminative validity, responsivity, and reliability) of the FPS-R and CAS, and to determine whether degree of validity varied based on age, sex, and ethnicity.
METHODS: We conducted a prospective, observational study of English- and Spanish-speaking children ages 4 to 17 years. Children with painful conditions indicated their pain severity on the FPS-R and CAS before and 30 minutes after analgesia. We assessed convergent validity (Pearson correlations, Bland-Altman method), discriminative validity (comparing pain scores in children with pain against those without pain), responsivity (comparing pain scores pre- and postanalgesia), and reliability (Pearson correlations, repeatability coefficient).
RESULTS: Of 620 patients analyzed, mean age was 9.2 ± 3.8 years, 291(46.8%) children were girls, 341(55%) were Hispanic, and 313(50.5%) were in the younger age group (<8 years). Pearson correlation was 0.85, with higher correlation in older children and girls. Lower convergent validity was noted in children <7 years of age. All subgroups based on age, sex, and ethnicity demonstrated discriminative validity and responsivity for both scales. Reliability was acceptable for both the FPS-R and CAS.
CONCLUSIONS: The FPS-R and CAS overall demonstrate strong psychometric properties in children ages 4 to 17 years, and between subgroups based on age, sex, and ethnicity. Convergent validity was questionable in children <7 years old.
- pain scale
- Faces Pain Scale–Revised (FPS-R)
- Color Analog Scale (CAS)
- emergency department
- CAS —
- Color Analog Scale
- ED —
- emergency department
- FPS-R —
- Faces Pain Scale–Revised
- IQR —
- interquartile range
- MCSD —
- minimal clinically significant difference
What’s Known On This Subject:
The Faces Pain Scale–Revised and Color Analog Scale are self-report pain scales that are commonly used for children in the clinical and research settings.
What This Study Adds:
The Faces Pain Scale–Revised and Color Analog Scale overall demonstrate strong psychometric properties in children 4 to 17 years of age, including within subgroups of age, sex, and ethnicity. Convergent validity, however, is questionable in children <7 years old.
Pain is among the most common reasons a child presents to the emergency department (ED).1–3 The appropriate management of pain relies on the ability to accurately assess the extent of pain using a valid tool. Self-report is regarded as the primary source for assessment because pain is primarily an internal experience, and many children 4 years and older are able to provide meaningful reports of pain intensity when using the appropriate tools.4,5 Two commonly used self-report measures of pain in children are the Faces Pain Scale–Revised (FPS-R) and the Color Analog Scale (CAS).6,7 Both scales have been used in both clinical and research settings, including the pediatric ED.8–12
However, the degree of validity of the FPS-R has not been comprehensively evaluated in the ED setting. In addition, no studies have evaluated whether the degree of validity of the FPS-R and CAS varies based on clinically pertinent patient characteristics. Previous studies have shown that a child’s age, sex, and ethnicity are related to the child’s ability to describe pain, as well as the child’s perception and sensitivity to pain.13–19 It is important, therefore, to determine whether the validity of the FPS-R and CAS varies in children based on their characteristics, and how suitable these scales are in different demographics.
We aimed to evaluate the degree of validity of the FPS-R and CAS in children presenting to the ED, and to identify any differences in validity between subgroups based on age, sex, and ethnicity. Specifically, we aimed to determine the convergent validity, discriminative validity, responsivity, and reliability of the FPS-R and CAS in the untrained child.
We conducted a prospective, observational study in 2 urban pediatric EDs with a combined annual census of ∼110 000 visits. English- and Spanish-speaking children between the ages of 4 and 17 years, inclusive, were eligible. We excluded children for clinical instability or illness necessitating admission to the ICU; developmental delay or neurologic impairment; altered mental status; an underlying chronic pain condition (eg, sickle cell disease); or a medical history of multiple painful experiences, such as malignancies. The institutional review boards at both sites approved this study with written informed consent.
We enrolled a convenience sample of children with painful and nonpainful conditions as identified by the nurse at triage, with the painful or nonpainful nature of the child’s condition confirmed by asking the child themselves if they had “any pain” or “any hurt.” Children with nonpainful conditions who matched children with painful conditions by age (±6 months), sex, and ethnicity were identified and used as controls for the purpose of determining discriminative validity.
The FPS-R consists of 6 faces, with each face representing an increasing degree of pain moving from left to right (Fig 1), scored 0-2-4-6-8-10. Each child was shown the faces and read standard instructions in English or Spanish (both versions from www.iasp-pain.org/FPSR).
The CAS is a plastic instrument with a wedge-shaped color-gradated figure on one side, a numerical scale on the other, and a moveable slider (Fig 1). The child was shown the side of the instrument with the wedge-shaped figure with the slider positioned in the middle, and read a standard script: “Move the slider to the place that shows how much pain you have. This end means you have no pain [slider moved to the bottom], this end means you have the worst pain [slider moved to top].” The slider was moved back to the middle of the scale before the child used the scale. Once the child finished moving the slider, we recorded the corresponding numerical score from the reverse side of the instrument (scored from 0 to 10 in 0.25 units).
We performed all assessments in the child’s primary language. Children with painful conditions were asked to indicate their severity of pain first on the FPS-R and then on the CAS. For both scripts, the word “hurt” or “pain” was used interchangeably, depending on what seemed most understandable for each child.
For children with painful conditions, an analgesic was administered at the attending physician’s discretion, and the child reassessed at least 30 minutes later. When reassessed, we asked the child, “Is your pain much less, a little less, about the same, a little worse, or much worse compared to before you got your medicine?” Their pain was then assessed using the FPS-R and CAS in the same manner as the initial assessment. For our controls, we obtained baseline and 30-minute FPS-R and CAS pain scores in the same manner as our patients with painful conditions. The 30-minute assessment was performed to control for the nonpharmacologic experiences a patient may encounter that might affect his or her report of pain.
We determined the validity of the FPS-R and CAS in our study population and in each subgroup by evaluating convergent validity, discriminative validity, and responsivity. Additionally, we determined the reliability of both pain scales. The primary subgroups for which the study was powered included younger (4–7 years old) and older (8–17 years old) (age group), girls and boys (sex), and Hispanic and non-Hispanic (ethnicity). Subgroups analyzed in post hoc analyses for all types of validity included 4-, 5-, 6-, and 7-year-old children (each year of age within the younger age group), African American and white (race), and English and Spanish (primary language). We removed patients for whom missing data precluded the determination of the validity measures.
Convergent validity refers to the degree to which 2 different scales that are supposed to measure the same thing (ie, pain) produce similar results. We assessed convergent validity by determining both Pearson correlations and agreement (Bland-Altman method) between the FPS-R and CAS in all patients.20 We compared Pearson correlations between subgroups using the Fisher r-to-Z transformation. We considered the scales to have acceptable agreement if the difference between the FPS-R and CAS scores was ≤2 points for more than 80% of children.21 This was done between subgroups, including between each category of pain severity (ie, no more than mild pain = 0–3.9; moderate = 4.0–6.9; severe = 7.0–10.0). Correlation and agreement were also evaluated in the younger age group after adjusting for anchor bias, a well-documented tendency for young children (especially those <5 years old) to select the extremes on scales.15,22 To do this, we removed children who scored 10 on both the FPS-R and CAS, and repeated the aforementioned analyses for convergent validity.
Discriminative validity is a form of content validity, which is the degree to which a test (eg, FPS-R) is actually measuring only the construct it is meant to measure (eg, pain), and not something else (eg, anxiety). We determined discriminative validity by using the independent samples t-test to compare the initial mean FPS-R and CAS scores in children with painful conditions versus those with nonpainful conditions. We used analysis of variance (ANOAVA) to compare discriminative validity for each year of age.
Responsivity to pain-producing or pain-relieving events is one way to demonstrate construct validity. This is the degree to which a test behaves in a manner that is consistent with what the test is purported to measure. For example, we expected that the FPS-R score should decrease after analgesic administration. We determined responsivity in children with painful conditions by comparing the initial mean FPS-R and CAS pain scores with their respective postanalgesia pain score using the paired-sample t-test. We assessed controls in a similar fashion, but with the expectation that there should be no difference between the initial and 30-minute scores. We used ANOVA to compare responsivity for each year.
Reliability is not a measure of validity, but is a prerequisite for a scale to be considered valid. Reliability describes the overall consistency of a measure under similar conditions and across time. Test-retest reliability evaluates the degree to which pain scores are consistent from one assessment to the next, and was determined in children with painful conditions who reported that their pain was “about the same” pre- to postanalgesia. To do so, we first compared the Pearson correlations pre- and postanalgesia. Second, we determined the repeatability coefficient for the FPS-R and CAS using the Bland-Altman method. This is a measure representing the maximum difference expected to occur with a probability of 95%, due to the inherent imprecision of a scale, between repeated measurements in a patient whose pain is expected to remain the same.23 The lower the coefficient, the more reliable the test.
We performed all statistical analyses with SPSS (version 20; IBM SPSS Statistics, IBM Corporation, Armonk, NY).
From June 2011 to October 2012, we enrolled 660 children; 40 children were removed from analysis due to missing data (n = 20) or having no analgesia administered (n = 20) (for whom responsivity could not be assessed). Of the 306 controls (those with nonpainful conditions), 237 were matched to children with painful conditions. The characteristics of the children are presented in Table 1. The mean age ± SD and range for the total sample was 9.2 ± 3.8, 4–17; for children with painful conditions, was 9.8 ± 3.8. 4–17; and for children with nonpainful conditions, was 8.6 ± 3.7, 4–17. The mean ± SD and median time to pain score reassessment for children with painful and nonpainful conditions were 38.2 ± 17.7 and 32.2 minutes, and 32.1 ± 6.5 and 30.4 minutes, respectively.
Table 2 shows the Pearson correlations between the initial FPS-R and CAS scores. The Pearson correlations for the overall group were large, and were larger in children within the older compared with the younger age group (P < .0001); girls compared with boys (P < .01); and in white compared with African American children (P < .0001). We noted a trend toward decreasing Pearson correlations with decreasing age within the younger age group.
We present the Bland-Altman results for agreement in Figs 2, 3, and 4. Overall, approximately 12.4% of the total population fell outside of the predetermined limits of agreement (± 2/10 points) (Fig 2). There was a greater proportion of children who fell outside the limits of agreement in the younger compared with older age groups (P = .0186), in the 4-year-olds compared with 6- and 7-year-olds (P = .016 and P < .0002, respectively), and in African American compared with white children (P = .036) (Fig 3).
Figure 4 illustrates the proportion of children who exceeded the predetermined limits of agreement based on pain severity. There was a larger proportion of scores exceeding the limits of agreement when the pain scores were moderate across almost all subgroups except the older age group and 7-year-old children.
Table 3 depicts the Pearson correlations and agreement in the different age groups after adjusting for anchor bias. Pearson correlations in the younger age groups and in children younger than 7 years old were less, and decreased to below 0.70 after this adjustment. However, there were no clinically significant changes in agreement.
Scores on both the FPS-R and CAS were higher (P < .0001) in the group presenting with painful conditions than in the nonpainful group. There was a similar difference between all subgroups, including those in post hoc analyses of those based on race, primary language, and each individual year of age in the younger age group (P < .0001 for all comparisons).
We found a difference in both FPS-R and CAS pain scores before and after an analgesic was given in the total sample of children with painful conditions (Fig 5). There was a similar difference between all subgroups, including those in post hoc analyses (P < .0001 for all comparisons).
In children with nonpainful conditions, there was no difference in FPS-R or CAS, neither when the initial and 30-minute pain scores were compared in the total sample, nor between subgroups based on age group sex, and ethnicity. Post hoc analyses revealed that postanalgesic scores were greater than preanalgesic scores in children whose primary language was Spanish (P = .005).
Test-retest reliability was assessed in 40 children on both scales. Twelve (30%) patients were from the younger age group, 18 (45%) were girls, and 19 (47.5%) were Hispanic. The Pearson correlations were r = 0.77 and 0.89 for the FPS-R and CAS, respectively. The repeatability coefficients were ±0.53 and ±0.35 for the FPS-R and CAS, respectively. Figure 6 illustrate the absolute maximum differences in pre- and postanalgesic scores in this population for the FPS-R and CAS. The median absolute maximum difference was 0 (interquartile range [IQR] 0–2) for the FPS-R, and 0 (IQR 0–1) for the CAS.
In the current study, we found that the FPS-R and CAS have strong convergent validity, discriminative validity, and responsivity across the population of children 4 to 17 as a whole, and between almost all subgroups based on age group, sex, and ethnicity. We observed weaker convergent validity in the younger ages, and less agreement when pain severity was moderate. Reliability of both scales was acceptable across the entire sample.
The observed trend toward lesser validity with younger age is consistent with that described in previous literature, with both decreasing accuracy in using the FPS-R and decreasing agreement with each younger year of age.13,24 Indices of validity for 7-year-olds were similar to those of children in the older age group. The weaker Pearson correlations in the younger ages were accentuated when anchor bias was taken into account. Our analysis draws particular attention to the poor convergent validity in 4-year-old children, which is similar to that described by de Tovar et al, but very different from the original derivation study of the FPS-R in which 4-year-old children had a high Pearson correlation of r = 0.86 between the FPS-R and analog scales (ie, VAS and CAS).6,24 However, the latter study included only 9 children aged 4 years.
The lower degree of agreement we observed when pain severity was moderate also appeared to be most related to age. Our results suggest the FPS-R and CAS may be most useful in differentiating pain of high and low severity in children <7 years old, but less useful for making more nuanced distinctions within the category of moderate severity. This would be consistent with previous literature that identify preschool-aged children as having less developed skills required for providing accurate self-reports of pain (eg, classification, seriation, or matching), showing known response biases to choose the highest and lowest scores on a pain scale, and being able to distinguish only 2 response categories with high reliability.22,25,26 One may question the utility of including a moderate-severity category, or even using the FPS-R, in children 4 years old. This would prompt an important change in clinical practice, as current international recommendations state the FPS-R is appropriate for an age range that includes 4-year-olds.27,28
The degree of agreement between the FPS-R and CAS has been shown to be acceptable in children ages 5 to 15.24 Our study demonstrated similar results using the criterion of the 80% confidence interval, although every subgroup would have summarily failed if the 95% confidence interval criterion was applied instead. This observation would support the proposition that the 95% criterion is too strict for self-report pain scores in children, and that the more lenient 80% criterion is more consistent with the other evidence of strong validity found throughout our study.21
We demonstrated an acceptable degree of test-retest reliability for the FPS-R and CAS, with Pearson correlations similar to those previously reported.28,29,30 The repeatability coefficients for both scales have not been previously described, and the ones we determined demonstrated strong reliability. The latter coefficient is consistent with the minimal clinically significant difference (MCSD) in pain previously determined for CAS, meaning that a change of 2 units (IQR 1–3) can be appropriately attributed to a pain intervention, and not just the inherent imprecision of the pain scales.10 However, as the MCSD for the FPS-R has not been previously determined, we cannot interpret the repeatability coefficient that we determined in relation to the MCSD previously determined for the FPS.10
We did not enroll consecutive patients, but our population included a relatively equal distribution of different pain severities. Although most of our children with painful conditions receiving nonopioid oral analgesics could reflect a population with lower pain severity, this could also be attributed to health care providers typically using less-potent medications for children compared with adults.31–33
Our assessment of reliability was limited by the large proportion of patients for the FPS-R and CAS (35% and 62.5%, respectively) who did not have a difference of exactly 0 between their pre- and postanalgesic scores (Fig 6). Another limitation was analyzing scores from children who described their pain as “about the same,” rather than “the same,” which could have introduced some inherent imprecision to our methodology. Finally, there were not enough patients to conduct analyses between each subgroup to see whether reliability varied based on clinically pertinent patient characteristics.
Our decision to not counterbalance the order in which the scales were presented could have subjected our findings to order effects: the comprehension of a scale may be influenced by another scale that is presented first, artificially increasing validity coefficients. However, none of our measures of validity were markedly higher than those previously described, and a previous study demonstrated that the order of administration of the FPS-R and CAS had no effect on mean self-rating.24
Finally, the potential inability of younger-aged children, particularly 4-year-olds, to use the CAS may have confounded our interpretations of convergent validity. These children may be unable to “map” the pain they are experiencing onto a second dimension, such as a visual analog scale, and the poor convergent validity may reflect inadequacies in the CAS rather than the FPS-R.34 However, the CAS has been shown in other studies to correlate well with the FPS-R in children <7 years old, including 4-year-olds.6,24
The FPS-R and CAS overall demonstrate strong psychometric properties in children ages 4 to 17 who present with acute pain to the pediatric emergency department, including between subgroups based on age, sex, ethnicity, race, primary language, and each year of age in the younger age group. However, convergent validity of the FPS-R and CAS appears to be questionable in children <7 years old, especially in 4-year-old children, and when pain scores were reported in the category of moderate severity in these younger years of age. Further research is warranted in children younger than 7 years of age to better evaluate convergent validity, using pain scales that may be more consistently developmentally suited for these younger ages.
- Accepted July 2, 2013.
- Address correspondence to Daniel S. Tsze, MD, MPH, Division of Pediatric Emergency Medicine, 622 W 168th St, PH 137, New York, NY, 10032. E-mail:
Dr Tsze led the conduct of the study, conceptualized and designed the study, coordinated and supervised data collection at 1 of the 2 sites, enrolled patients, conducted the statistical analyses for the study, drafted the initial manuscript, and critically edited the manuscript; Dr von Baeyer provided substantial intellectual input into all aspects of the study, assisted with the statistical analyses for the study, and critically edited the manuscript; Dr Bulloch conceptualized and designed the study, coordinated and supervised data collection at 1 of the 2 sites, enrolled patients, and critically edited the manuscript; Dr Dayan co-led the conduct of the study, conceptualized and designed the study, and critically edited the manuscript; and all authors approved the final manuscript as submitted.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: Supported in part by Columbia University’s CTSA grant UL1TR000040 from the National Center for Advancing Translational Sciences/National Institutes of Health. Funded by the National Institutes of Health (NIH).
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
- Stinson JN,
- Kavanagh T,
- Yamada J,
- Gill N,
- Stevens B
- Chambers CT,
- Johnston C
- Fowler-Kerry S,
- Lander J
- Fortier MA,
- Anderson CT,
- Kain ZN
- Bland JM,
- Altman DG
- Besenski LJ,
- Forsyth SJ,
- von Baeyer CL
- ↵World Health Organization (WHO). WHO guidelines on the pharmacological treatment of persisting pain in children with medical illnesses. Geneva, Switzerland: World Health Organization [www.who.int]. Available at: http://whqlibdoc.who.int/publications/2012/9789241548120_Guidelines.pdf. Accessed April 12, 2013
- ↵International Association for the Study of Pain (IASP). Faces Pain Scale–Revised. Washington, DC: International Association for the Study of Pain (IASP) [www.iasp-pain.org]. Available at: www.iasp-pain.org/Content/NavigationMenu/GeneralResourceLinks/FacesPainScaleRevised/default.htm. Accessed April 12, 2013
- Schechter NL,
- Allen DA,
- Hanson K
- Eland JM,
- Anderson JE
- Bieri D,
- Reeve RA,
- Champion GD,
- Addicoat L,
- Ziegler JB
- Copyright © 2013 by the American Academy of Pediatrics