BACKGROUND: The Strengths and Difficulties Questionnaire (SDQ) is widely used to screen for child mental health problems and measure common forms of psychopathology in 4- to 16-year-olds. Using longitudinal data, we examined the validity of a version adapted for 3- to 4-year-olds.
METHODS: We used SDQ data from 16 659 families collected by the Millennium Cohort Study, which charts the development of children born throughout the United Kingdom during 2000–2001. Parents completed the preschool SDQ when children were aged 3 and the standard SDQ at ages 5 and 7. The SDQ’s internal factor structure was assessed by using confirmatory factor analysis, with a series of competing models and extensions used to determine construct, convergent, and discriminant validity and measurement invariance over time. Predictive validity was evaluated by examining the relationships of age 3 SDQ scores with age 5 diagnostic measures of attention-deficit/hyperactivity disorder, autism spectrum disorder/Asperger syndrome, and teacher-reported measures of personal, social, and emotional development.
RESULTS: Confirmatory factor analysis supported a 5-factor measurement model. Internal reliability of subscales ranged from ω = 0.66 (peer problems) to ω = 0.83 (hyperactivity). Item-factor structures revealed measurement invariance over time. Strong positive correlations between ages 3 and 5 SDQ scores were not significantly different from correlations between age 5 and 7 scores. Conduct problems and hyperactivity subscales independently predicted developmental and clinical outcomes 2 years later.
CONCLUSIONS: Satisfactory psychometric properties of the adapted preschool version affirm its utility as a screening tool to identify 3- to 4-year-olds with emotional and behavioral difficulties.
- parent-reported SDQ
- factor structure
- factorial invariance
- average variance explained
What’s Known on This Subject:
Although the psychometric properties of the school-age Strengths and Difficulties Questionnaire (SDQ) have been extensively examined by using longitudinal data, the preschool version of the SDQ has only been explored in a limited number of cross-sectional studies.
What This Study Adds:
This is the first psychometric study of the preschool SDQ using longitudinal data. We report measurement invariance over time, satisfactory reliability, construct and criterion validity, and predictive utility for subsequent behavioral problems (4 years) and clinical disorders (2 years).
The Strengths and Difficulties Questionnaire (SDQ)1 is widely used in research, clinical, and community settings to screen for externalizing and internalizing problems.2–4 Five subtypes of children’s behavior (conduct problems, hyperactivity, emotional problems, peer problems, and prosocial behaviors) are each assessed with 5 questions. Three versions are available for school-aged children: parent- and teacher-reported versions (4–16 years) and a self-report version (11–17 years).
Several studies have addressed the validity of the parent-reported SDQ in school-aged samples, predominantly confirming the intended 5-factor structure.5,6 A 3-factor configuration of externalizing (conduct problems and hyperactivity), internalizing (emotional and peer problems), and prosocial factors has also been proposed and suggested for use in epidemiologic studies and in low-risk populations.7,8 The internal reliability of SDQ subscales has been predominantly examined by using Cronbach’s α, a measure of the interrelatedness of items; however, α estimates are a lower bound for reliability and is often underestimated.9 A meta-analytic review reported weighted mean α coefficients extracted from 26 studies that showed generally modest reliabilities for parent reports (0.53 < α < 0.76).10 McDonald’s ω, which estimates the proportion of a scale measuring a construct, typically yields higher reliability estimates but has rarely been used to assess reliability of the SDQ. A comparative study reported higher ω coefficients (0.74 < ω < 0.91) than α coefficients (0.54 < α < 0.82) for the school-age SDQ.9
Previous research offers strong evidence of the school-age SDQ’s relatedness to separate constructs (convergent validity). Weighted-average correlation coefficients between equivalent pairs of SDQ and Child Behavior Checklist subscales11 from 9 parent-reported studies were uniformly strong and positive (range: 0.52 < r < 0.71).10 Several studies showed strong correlations between SDQ subscales and “real world” outcomes such as clinical diagnoses (criterion validity); SDQ scores identified school-aged children with concurrent behavioral and emotional disorders, including attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorder/Asperger syndrome (ASD/AS), and predicted their occurrence 3 years later.4,12,13 However, multitrait-multimethod analyses have not provided consistently strong evidence of discriminant validity of the school-age SDQ subscales. For example cross-informant, within-subscale correlations have sometimes been no stronger than cross-subscale correlations,4 suggesting that the intended behaviors are measured with some overlap between constructs.4,14
A slightly modified version of the SDQ has been developed for preschool-aged (3–4 years old) populations (http://www.sdqinfo.org). Preschool is a valuable time to identify and treat childhood psychopathology, and parent report is likely to provide a particularly informative perspective.
Assessing the psychometric properties of the parent-reported preschool SDQ is imperative before widespread adoption can be recommended. However, only 4 studies15–18 have done so, and none performed a single comprehensive assessment of convergent, discriminant, and criterion validity, measurement invariance across time, and internal reliability.
The previous studies were based in The Netherlands,15 Spain,16 Germany,17 and Japan.18 Each supported a 5-factor configuration.15–18 Table 1 presents preschool internal reliabilities compared with the school-age SDQ.10 Only 1 preschool validation study used McDonald’s ω coefficient to estimate internal reliability. Significant correlations between equivalent pairs of SDQ and Child Behavior Checklist11 internalizing and externalizing subscales indicated external convergent validity. SDQ total difficulties scores (summed hyperactivity, conduct, emotional, and peer problem scores) were significantly associated with “treatment status” and “presence of any disorder” criteria, supporting concurrent criterion validity of the measure.15,16 However, each preschool SDQ study was limited to a cross-sectional design, prohibiting examination of factor structure stability over time and validity in predicting future psychopathology.
This study is the first, to our knowledge, to assess the psychometric properties of the preschool SDQ by using longitudinal data. We used parent-reported preschool SDQ scores at age 3 in conjunction with school-age SDQ responses collected at ages 5 and 7 to determine the optimal factor structure and the extent of measurement invariance across time. We examined internal reliability with α and ω coefficients and convergent and discriminant validity by using average variance explained (AVE) scores. Finally, we used criterion outcome measures at age 5, which included parent-reported diagnoses of ADHD and ASD/AS and teacher-reported measures of personal, social, and emotional (PSE) development to assess the utility of the preschool SDQ to predict clinical outcomes 2 years later.
The Millennium Cohort Study (MCS) is a UK longitudinal study of children born between September 2000 and August 2001.19 This article uses 3 waves of data collected when children were ≈3, 5, and 7 years old. At age 3, 19 942 families were sampled; 15 590 responded to at least 1 part of the MCS (response rate: 78%) and 14 444 completed the SDQ (mean child age at data collection = 3.15 years; age range = 2.65–4.57 years). At age 5, 19 184 families were sampled; 15 246 responded (79%) and 14 615 had SDQ data (mean child age = 5.22 years; range = 4.40–6.13 years). At age 7, 17 031 families were sampled; 13 857 responded (81%) and 13 358 had SDQ data (mean child age = 7.24 years; range = 6.34–8.15 years). Only 1 child from each of 246 families containing multiple births was included. Observations collected when children were >1 year older or younger than the intended study age were excluded. Our final analysis sample consisted of 42 417 observations from 16 659 distinct children (48% boys) for whom we had SDQ scores on at least 1 occasion. MCS sampling was stratified to oversample children living in socioeconomic deprivation and poverty and in ethnically diverse areas. Sampling weights were provided to adjust for oversampling relative to UK demographic characteristics, attrition, and nonresponse.19
The National Health Service Research Ethics Committee provided ethical approval to the MCS. Informed consent procedures included obtaining written parental consent.
The parent-report SDQ1 contains 25 items forming 4 difficulties subscales–conduct problems, hyperactivity, emotional problems, peer problems and a prosocial subscale. The preschool version (administered at age 3) and standard version (ages 5 and 7) were used. In the preschool version (www.sdqinfo.org), 3 items are adjusted to reflect age-appropriate behaviors and contexts. Specifically, “argumentative with adults” and “can be spiteful” replace “often lies or cheats” and “steals from home, school or elsewhere” (conduct problems subscale), and “can stop and think before acting” replaces “thinks things out before acting” (hyperactivity subscale). Parents rated statements as either 0 (not true), 1 (somewhat true), or 2 (certainly true).
When the children were age 5, the parents were asked whether a health professional had ever diagnosed the child with ADHD and ASD/AS. Medical records were not consulted. Prevalence rates were 0.9% for ADHD and 0.9% for ASD/AS (0.2% for comorbid disorders).
PSE development (a subscale of the Foundation Stage Profile) was rated by teachers for children aged 4 to 5 years (www.education.gov.uk/eyfs). The scale contains 27 dichotomous items that measure dispositions and attitudes, eg, “maintains attention and concentrates”; social development, eg, “plays alongside others”; and emotional development, eg, “separates from main carer with support.” The internal reliability of this scale in the MCS was α = 0.91.
Analysis of the preschool SDQ comprised 5 stages, in turn assessing internal factor structure, internal reliability, measurement stability over time (measurement invariance), construct validity, and predictive criterion validity.
First, the preschool and school-age SDQ’s factor structure was examined by using confirmatory factor analysis. The established 5-factor model was compared against a 3-factor model (externalizing: conduct problems and hyperactivity; internalizing: emotional and peer problems; and prosocial factors)7 and a 1-factor model (ie, Harman’s single-factor test). In each case, factor loadings and thresholds varied freely across the 3 ages, with only the item-factor arrangement fixed to be equal (configural invariance). Second, 2 internal reliability measures, Cronbach’s α (interrelatedness of subscale items) and McDonald’s ω (proportion of subscale measuring construct), were calculated for each subscale within a structural equation model framework that accounts for the ordinal nature of item response distributions. Equality of coefficients across time was assessed by using bootstrapped confidence intervals (1000 replications).20,21
Third, we examined factorial invariance, ie, stability of the 5-factor measurement model across time. The configural invariance model from the first stage of analyses provided a baseline. Factor loadings (metric invariance), then thresholds (scalar invariance), and finally factor loadings and thresholds (strong invariance) were sequentially fixed equal across time.* Increasing degrees of factorial invariance were demonstrated if model fit was not diminished by additional constraints. Initially, the invariance of each subscale was tested independently of other subscales. All subscales were then tested in the same model, implementing constraints to establish the best-fitting measurement model.
Fourth, construct validity was evaluated by using the average variance explained in subscales items by their associated factor (AVE score). Factors with AVE scores >0.50 demonstrate satisfactory internal convergent validity. Factors with AVE scores exceeding their highest squared correlation with another factor achieve adequate external discriminant validity.22 The effectiveness of the adapted preschool items (see Measures) was examined by using R2 values, ie, amount of variance in each item explained by its associated factor. A confirmatory factor analysis model in which factor loadings of the 3 items were free to vary across time was compared with one in which they were constrained.
Finally, predictive validity was examined. We tested predictive criterion validity using age 5 outcomes of ADHD and ASD/AS (binary measures) by using probit regression and PSE (continuous) by using linear regression. Probit regression coefficients range from −1 to 1: 1-point increases in predictors equate to increases in the outcome z score (SDs above the mean) at the magnitude of the regression coefficient. Predictive validity of the preschool SDQ was also assessed through correlations with school-age SDQ scores.
Mplus v7.11 was used for all analyses.23 SDQ items were treated as ordinal, with weighted least-squares means and variance–adjusted estimation used.23 Given the χ2 statistic’s propensity to reject good models when samples are large and/or complex, the comparative fit index (CFI) and root mean square error of approximation (RMSEA) were used to assess model fit. Model fit was considered adequate where CFI values exceeded 0.95 and RMSEA values fell below 0.06.24 For testing competing models in very large samples, we followed Cheung and Rensvold’s (2002) suggestion that parsimonious models are superior when increases in the CFI offered by more complex model are ≤0.01.25 Acknowledging the high power achieved with our large sample, the statistical significance level for testing parameters was set at P < .0005. Effect sizes and 99.95% confidence intervals were reported appropriately.
The 5-factor model (χ2 = 28 332, degrees of freedom [df] = 2520, P < .0005, RMSEA = 0.025, CFI = 0.905) fitted the data better than 3-factor (χ2 = 36 769, df = 2589, P < .0005, RMSEA = 0.028, CFI = 0.874) and 1-factor (χ2 = 62 172, df = 2622, P < .0005, RMSEA = 0.037, CFI = 0.780) models, so was used in subsequent analyses. Standardized factor loadings for the 5-factor configuration (Fig 1) ranged from 0.46 < β < 0.74 (conduct problems), 0.39 < β < 0.80 (hyperactivity), 0.51 < β < 0.86 (emotional problems), 0.44 < β < 0.61 (peer problems), and 0.55 < β < 0.72 (prosocial). Several items had factor loadings <0.6 (Table 2). However, the underlying factor explained >20% of item variance for all but 2 of the 25 preschool items.
Reliability analyses yielded comparable α and ω estimates. Internal reliability of the standard SDQ was acceptable at ages 5 (α: 0.71 < α < 0.85; ω: 0.72 < ω < 0.86) and 7 (0.76 < α < 0.86; 0.77 < ω < 0.88). By using the preschool version at age 3, only the peer problems subscale failed to achieve the 0.70 benchmark for satisfactory internal reliability (0.63 < α < 0.80; 0.66 < ω < 0.83). Examination of 99.95% confidence intervals indicates that, although mostly adequate at age 3, internal reliability was significantly higher at ages 5 and 7 (Table 3).
Factorial invariance analyses tested whether item-factor loadings and threshold values differed significantly across time. For each subscale, when factor loadings were constrained to be equal across time, fit indices were not reduced compared with the configural model (model A, Table 4), demonstrating metric invariance. Furthermore, constraining item-factor thresholds equally across time did not reduce fit indices for the conduct problems and prosocial subscales, demonstrating scalar invariance. Three additional models were tested to establish the best-fitting model (Table 4). In model B, all factor loadings and conduct problems and prosocial thresholds were constrained. Model C additionally constrained hyperactivity thresholds, which showed an insubstantial loss of fit from the configural model (ΔCFI = 0.002) when tested for scalar invariance (constraining thresholds only). In model D, all factor loadings and thresholds were fixed equal across time. Fit indices for model D were poor but were acceptable for models B and C. Model C was preferred, due to its parsimony. Model C demonstrated strong (ie, factor loadings and thresholds) invariance for conduct problems, hyperactivity, and prosocial subscales and metric (ie, factor loadings) invariance for emotional and peer problems.
We assessed convergent and discriminant validity using AVE scores. Although item R2 values (ie, proportion of item variance explained by the underlying factor) increased slightly with age, factor loadings were not significantly different across ages in the unconstrained model A (Table 5). AVE scores for the preferred model C ranged from 0.34 (peer problems) to 0.60 (hyperactivity), with only hyperactivity achieving the 0.50 benchmark for satisfactory internal convergent validity.22 However, every subscale demonstrated adequate external discriminant validity, with AVE scores exceeding squared interfactor correlations. Likewise, correlations between SDQ factors at age 3 ranged from 0.15 < r < 0.68, fulfilling Kline’s (2005) “r < 0.85” benchmark for distinct factors Kline, RB. (2005). Principles and practice of structural equation modeling (2nd ed.). New York: Guilford.
Three adapted items distinguish the preschool and school-age SDQs (see Methods). We found significantly higher R2 values for preschool conduct problem items (compared with standard SDQ age 5 equivalent) and, conversely, lower R2 values for the standard age 5 hyperactivity item (compared with the adapted preschool equivalent) (Table 6).
The predictive validity of SDQ subscales was supported by strong positive correlations between age 3, 5, and 7 SDQ factors (Fig 1, Table 7). No significant differences between correlations were found. By using probit and linear regression analyses, only the preschool conduct problems and hyperactivity subscales independently predicted age 5 outcomes (Table 8). Hyperactivity positively predicted ADHD (β = 0.41) and ASD/AS (β = 0.58) and negatively predicted PSE development (β = −0.16), whereas conduct problems positively predicted ADHD (β = 0.40). In a simple model without covariates, conduct problems also predicted ASD/AS, but this relationship became negative (β = −0.55) when covariates were added (Supplemental Table 9).
This is the first longitudinal examination of the psychometric properties of the parent-reported preschool SDQ from preschool to school-age developmental stages. The 5-factor model established for the school-age SDQ provided an adequate fit to preschool SDQ data. Subscales exhibited good internal reliability and adequate discriminant validity, albeit alongside weaker internal convergent validity. All subscales demonstrated metric factorial invariance over time, with conduct problems, hyperactivity, and prosocial subscales presenting strong factorial invariance over time. Conduct problems and hyperactivity subscales also predicted clinical disorders 2 years later.
Our findings diverge from previous research in 2 areas. First, we reported poor model fit for the alternative 3-factor configuration; alternative validation studies observed adequate fit for both configurations using school-aged4 and preschool-aged16 populations. Second, we reported higher Cronbach’s α reliability scores than most preschool and school-aged validation studies,10,15 with only the preschool peer problems subscale failing to meet the α > 0.70 criteria for satisfactory internal reliability. Because of skewness and the ordered categorical nature of our variables, we estimated α within a structural equation model framework, which resulted in higher α coefficients.20 Our ω reliability analyses yielded results consistent with previous studies reporting ω reliabilities for preschool and school-age SDQs.9,16
This was the first examination of discriminant validity using the preschool SDQ. Satisfactory discriminant validity was observed for all preschool subscales. However, weak internal convergent validity suggested that some items are not strongly related to their associated factors. Item variance explained by respective factors increased with age, consistent with previous research that observed parent-reported SDQ factors typically accounted for 50% of item variance for 10 to 12 year olds and <50% for 5 to 7 year olds.27 The 2 preschool-specific conduct problem items had adequate communalities compared with the age 5 equivalent; conversely, the preschool “reflective” item was a poor indicator, with the hyperactivity subscale explaining only 15% of item variance.
Substantial positive correlations between corresponding factors measured at ages 3, 5, and 7 years support the predictive validity of SDQ subscales across 2- and 4-year periods. Moreover, correlations between preschool and age 5 scores were comparable to those between age 5 and 7 scores, supporting the predictive validity of the preschool SDQ as similar to the school-age SDQ administered at age 5.
Preschool conduct problems and hyperactivity subscales demonstrated predictive criterion validity over 2 years. Hyperactivity positively predicted ADHD, ASD/AS, and PSE development. Conduct problems positively predicted ADHD. We also report a weak positive simple relationship between conduct problems and ASD/AS, which became negative when other SDQ subscales were covaried (Supplemental Table 9). A similar negative relationship between conduct problems and ASD/AS while controlling for other SDQ subscales was reported with older children.4 This negative relationship may reflect overlap with other SDQ subscales, particularly hyperactivity, a robust independent predictor of later ASD/AS.
We found substantial continuities in peer and emotional problems, as measured by the SDQ, from preschool- to school-aged children; however, these subscales did not independently predict external measures of psychopathology. Rather than suggesting that these scales lack clinical value, this finding is likely to reflect the range of outcomes available in the MCS data set. Specifically, it is plausible that these subscales would independently predict future internalizing problems such as depressed mood and anxiety. Multiple informants of child behaviors would enhance the validity of findings, with teacher report likely to be most valuable at this age, although difficult to collect in UK samples because preschool education is not compulsory. The SDQ impact supplement, which investigates chronicity, distress, social impairment, and burden, was excluded from analyses; although this supplement provides clinically useful information,5 the brevity and accessibility of the 25-item questionnaire increase its suitability for widespread use. Future research focused on application in clinical settings might usefully address the impact supplement and evaluate clinical cutoffs for psychiatric caseness.
The current study validates the SDQ as a brief measure of emotional and behavioral problems in preschool children, with psychometric properties largely comparable to the extensively used school-age SDQ. The current findings encourage its application within research contexts and as a screening tool in clinical and community settings. Screening raises several issues beyond the psychometric properties of the instrument, which have been discussed elsewhere.28
The school-age SDQ has been extensively validated for its intended use as a screening tool to detect 4- to 16-year-olds at risk of clinical or developmental disorders.1,10,29 The current study confirms satisfactory psychometric properties for the adapted preschool version, affirming its utility as a brief measure to identify 3- to 4-year-olds with emotional and behavioral difficulties.
We thank Professor Robert Goodman for reviewing previous drafts of the manuscript.
- Accepted February 18, 2015.
- Address correspondence to Simone Croft, MSc, Department of Psychology, University of Sheffield, Western Bank, Sheffield S10 2TP, UK. E-mail:
Ms Croft contributed to the choice of analysis, carried out the analyses, and drafted the manuscript; Dr Stride conceptualized and designed the analyses and reviewed and revised the manuscript; Dr Maughan conceptualized the study and critically reviewed the manuscript; Dr Rowe coordinated the research team, contributed to the choice of analysis, and reviewed and revised the manuscript; and all authors approved the final manuscript as submitted.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: This work was supported by an Economic and Social Research Council PhD studentship (ES/J500215/1), awarded to Ms Croft and supervised by Drs Rowe and Stride.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
↵* The 3 amended preschool items were free to vary during factorial testing.
- Goodman A,
- Lamping DL,
- Ploubidis GB
- Achenbach TM
- Doi Y,
- Ishihara K,
- Uchiyama M
- ↵Jones EM, Ketende, SC, Sosthenes C. Millennium Cohort Study: User Guide to Analysing MCS Data Using SPSS. 1st ed. London, UK: Centre for Longitudinal Studies; 2010
- ↵Templin J. Lecture 8: comparing classical test theory with CFA and how to use test scores in secondary analyses. 2013. Available at: http://jonathantemplin.com/files/sem/sem13psyc948/sem13psyc948_lecture08.pdf. Accessed May 3, 2015
- ↵Muthén LK, Muthén BO. Mplus User’s Guide. 7th ed. Los Angeles, CA: Muthén & Muthén; 1998–2012
- Kline, RB. Principles and Practice of Structural Equation Modeling. 2nd ed. New York: Guilford;2005
- Alexander KE,
- Brijnath B,
- Mazza D
- Copyright © 2015 by the American Academy of Pediatrics