OBJECTIVE: To inform primary care screening and preventive intervention efforts, the authors examined the screening efficiency of the parent version of the Strengths and Difficulties Questionnaire (SDQP4-16) for persistent disorders relative to transient disorders and its capacity to distinguish between the two.
METHODS: Persistence and transience in preschool-onset psychiatric disorders were identified by using data from a large population-based cohort study in Norwegian children initially assessed at age 4 and followed up at age 6 (n = 1038). Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, diagnoses at both time points were assigned by using the Preschool Age Psychiatric Assessment Interview, against which the SDQP4-16 was compared through receiver operating characteristics analysis.
RESULTS: The screening efficiency for persistent disorders exceeded that for transient disorders with a specificity of 86.1%, a sensitivity of 79.3%, and an area under the curve value of 0.85. The SDQP4-16 was able to discriminate persistent disorders from transient disorders at an area under the curve value of 0.71. At the selected cutoff of 10, the negative predictive value was 99.6%, whereas the positive predictive value was 9.5%, partly due to the low prevalence (1.8%) of persistent disorders.
CONCLUSIONS: The SDQP4-16 is a sensitive tool for detecting persistent psychiatric disorders in young children. However, a large proportion of positive screens are nonpersistent cases, as indicated by the high false-positive rate. Thus, the clinical utility of the SDQP4-16 in primary care screening for persistent disorders is uncertain, particularly in samples in which the rate of psychiatric disorders is low.
- AUC —
- area under the curve
- CC —
- complete case
- CI —
- confidence interval
- MI —
- multiple imputation
- NPV —
- negative predictive value
- OR —
- odds ratio
- PAPA —
- Preschool Age Psychiatric Assessment
- PPV —
- positive predictive value
- ROC —
- receiver operating characteristic
- SDQ —
- Strengths and Difficulties Questionnaire
- SDQP4-16 —
- parent version of the Strengths and Difficulties Questionnaire
- SDQtds —
- parent version of the Strengths and Difficulties Questionnaire total difficulties score
What’s Known on This Subject:
Preschool-onset psychiatric disorders may continue into school age if undetected. Whether primary care screening can prospectively identify psychopathology that will persist is unknown.
What This Study Adds:
Preschool screening identifies psychopathology that will persist with high sensitivity. However, the false-positive rate indicates that brief checklists may not reliably distinguish persistent cases from children presenting symptoms at the time of screening, particularly at low rates of disorder.
Psychiatric disorders are prevalent among preschoolers (7%–26%),1–6 yet few are identified and referred.7–9 Community screening may increase the rate of children being reliably identified and treated.10 A valid screen could assist clinicians faced with overlapping clinically concerning and normative behavior11,12 and inform the decision of further assessment and referral. However, studies indicate that early-onset disorders may follow different developmental pathways relevant to screening. Whereas some preschool diagnoses continue into school age,13 approximately half of preschool diagnoses are no longer present at follow-up,13,14 and half of those with a diagnosis at follow-up did not meet criteria at the initial assessment.13 Thus, screening may identify preschool-aged children whose trajectories are diverse, those whose disorder would continue into school age if undetected, those whose disorder would remit, and those whose disorder will emerge at a later time. Previous examinations of preschool screening have not taken into account follow-up assessments.15–18 Consequently, the screening efficiency of existing screens with respect to persistence as opposed to transience of disorders is unknown. This knowledge is potentially useful to guide prevention efforts.
Early childhood psychiatric disorders are highly comorbid,6,13 and research indicates that a general psychopathology construct covers much of the variance in preschool disorders.19,20 Moreover, a positive screen at the primary care level could indicate referral for further evaluation, diagnosing, and possible treatment, regardless of the symptom type. Thus, when screening young children at this level it may be just as appropriate to target the presence of any psychopathology as specific diagnoses.
Studies have shown the parent-completed Strengths and Difficulties Questionnaire (SDQ4-16)21 to adequately screen for concurrent psychiatric problems in preschool community populations.15,22,23 Moreover, compared with pediatric primary care providers, the SDQ identified substantially more children with possible psychopathology.24 However, whether brief and user-friendly screens such as the SDQ are able to capture persistent disorders is unknown. This study examined the parent-completed SDQ4-16 with respect to (1) the overall screening efficiency for persistent disorders relative to transient disorders and the capacity to distinguish the two and (2) the optimal cutoff for persistent cases. Moreover, we extend the generalizability of our findings to samples with higher rates of persistent disorders by (3) determining the screening efficiency of the SDQ for the most common range of stability rates.
Recruitment and Participants
The sampling frame was the Trondheim Early Secure Study, comprising 2 birth cohorts (2003–2004) of children in the city of Trondheim, Norway, who were invited to the community health check-up for 4-year-olds. The Trondheim Early Secure Study has been described in detail elsewhere,6 including screening with the parent version of the SDQ (SDQP4-16). The study was approved by the Regional Committee for Medical and Health Research Ethics. After completely describing the study to the eligible subjects, written informed consent from 2475 (82.1%) parents was obtained. To reduce costs, the parents of a subsample of 1250 children, oversampled according to higher SDQ scores to increase statistical power, were invited to participate in a structured diagnostic interview concerning the child’s mental health2 completed at age 4 and readministered 2 years later, at age 6. Interview information was obtained for 1038 children, of whom 753 (72.5%) completed the age-4 and the age-6 interviews. Descriptive information on participants with completed interviews at both time points is provided in Table 1.
The SDQP4-16 was completed at the age-4 assessment. Of the five 5-item subscales (emotional problems, conduct problems, hyperactivity, peer problems, and prosocial behavior), the first 4 are summed to create a “total difficulties score” (SDQtds), ranging from 0 to 40. The SDQ has documented strong psychometric properties for preschool- and school-aged children.25,26 The Norwegian version has been validated in several large studies.27,28 In our sample, the Cronbach’s α for the total difficulties score was 0.74.
Psychiatric diagnoses at both time points were assigned by using the Preschool Age Psychiatric Assessment (PAPA), a semistructured psychiatric interview with parents.2 Symptoms occurring during the 3 months preceding the interview are rated according to a structured protocol involving both required and optional follow-up questions. Diagnoses were generated by computerized algorithms implementing the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition.29 Interviewers (n = 7) had at least a bachelor’s degree in relevant fields and extensive previous experience in working with children and families. They received training by the group who developed the measure. Interviewers were blind to the SDQ results at both time points.
To evaluate interrater reliability, 9% of the interview audio-recordings were recoded by blinded raters. Pairs of raters obtained the following interrater reliabilities30: attention-deficit/hyperactivity disorder, k = 0.96; oppositional defiant disorder, k = 0.89; conduct disorder, k = 0.78; any anxiety disorder, k = 0.89; any depressive disorder, k = 0.86; any sleep disorder, k = 0.87; encopresis, k = 0.92; and any disorder, k = 0.87.
All diagnoses were analyzed as an “any psychiatric disorder” category for both the age-4 and age-6 assessments. The possible combinations of outcomes at age 4 and age 6 generated the following groups: (1) a “concurrent disorder” group, consisting of children with diagnosis present at age 4 but absent at age 6; (2) a “prospective disorder” group, consisting of children with diagnosis absent at age 4 but present at age 6; and (3) a “persistent disorder” group, consisting of children with diagnosis present at both time points. Groups 1 and 2 are also referred to as transient disorders.
Missing Data and Attrition
Table 2 outlines the distribution of cases (children with PAPA diagnoses), noncases (children without PAPA diagnoses), and missing interview information across the 2 time points. There were missing diagnostic data for 4.0% of participants at the age-4 assessment and for 23.5% of participants at the age-6 assessment. The parent-completed SDQtds was available for all participants. Participants with complete data differed from participants with missing interview information by a lower SDQ impact score (parent rated: odds ratio [OR]: 0.59; 95% confidence interval [CI]: 0.38–0.93; teacher rated: OR: 0.58; 95% CI: 0.38–0.88) and were less frequently rated by the health nurse as being in “need of help for reported problems” (OR: 0.54; 95% CI: 0.41–0.73). Investigating the attrition rate from the age-4 to the age-6 assessment among children with a diagnosis relative to children without a diagnosis gives information about what type of missingness mechanism is at work. The higher, albeit nonsignificant, attrition rate among diagnosed children (30.2%) relative to nondiagnosed children (23.9%) indicates that data are not missing completely at random, but missingness is possibly related to observed data, for example missing at random.
Complete case (CC) analysis uses data only from cases with complete data and is unbiased only if data are missing completely at random. Thus, participants who completed at least 1 interview were included in the analyses (n = 1038), and missing data were handled by multiple imputation (MI) with the use of chained equations. MI analysis also uses information from cases with partially missing data, yielding higher statistical power than CC analysis. Furthermore, MI analysis is unbiased under the less restrictive missing at random assumption and generally less biased than CC if data are missing not at random.31 We created m = 100 imputed data sets,32 with estimates, CIs, and P values computed by using Rubin’s rules, and the fraction of missing information computed according to Buuren.32 For more information on MI, see the Supplemental Information.
Mplus 7.233 (Muthèn & Muthèn, Los Angeles, CA) was used to examine possible differences in rates of diagnoses between the age-4 and age-6 assessments by comparing a solution where the means at both time points were fixed to be similar with a model in which they were freely estimated. In this analysis, missing data were addressed via a full-information maximum likelihood procedure in which the variables used for imputation were entered as auxiliary variables.
The overall screening efficiency of the SDQP4-16 was evaluated by using receiver operating characteristic (ROC) curve analysis, which determines the area under the curve (AUC) for the scale against persistent and transient diagnoses. AUC values were interpreted according to Hosmer and Lemeshow,34 as follows: AUC = 0.5 (no discrimination), 0.7 ≤ AUC < 0.8 (acceptable discrimination), 0.8 ≤ AUC < 0.9 (excellent discrimination), and AUC ≥0.9 (outstanding discrimination). Probability-weighted versions of the AUC and ROC, together with 95% CIs, were computed by using Newson’s programs, “somersd” and “senspec,” which are available for download in Stata (StataCorp, College Station, TX).35,36
The ROC-generated sensitivity/specificity pairs were used to select a threshold for the identification of persistent cases. The sensitivity (proportion of screen-positives among diagnosed positives) and specificity (proportion of screen-negatives among diagnosed negatives) are more stable across populations than are the positive predictive value (PPV) and negative predictive value (NPV). Thus, the sensitivity and specificity data in the present sample allow for estimating PPV and NPV for various prevalences (see Supplemental Information). Screening efficiency was calculated for the present sample, as well as for prevalences of 10% and 15%.
Due to screen-stratification of the sample, we conducted weighted analyses by using weights proportional to the inverse of the drawing probability. Analyses were performed in Stata 13.37
Table 3 reports the rates of diagnoses at the age-4 and age-6 assessments; corresponding tendencies were observed in the imputed and complete case data, with the imputed estimates being slightly higher because they account for attrition. The prevalence was similar at both time points (P = .58). Meeting the criteria for a diagnosis at age 4 was associated with a fivefold greater risk of meeting the criteria for a diagnosis at age 6 in the imputed data (OR: 5.31; 95% CI: 2.87–9.84) and in the complete case data (OR: 5.17; 95% CI: 2.69–9.91).
None of the children diagnosed at the age-4 assessment had received treatment during the preceding 3 months. At age 6, 3 of 17 (17.6%) persistent cases had received treatment.
Overall Screening Efficiency
Further analyses of the screening efficiency of the SDQP4-16 are based on imputed data (n = 1038). According to Hosmer and Lemeshow’s definition,34 the SDQtds had excellent discrimination for persistent cases (see Fig 1). Acceptable discrimination was obtained for concurrent cases, whereas the AUC was below the acceptable level for prospective cases. The capacity to distinguish between persistent and transient cases was acceptable.
Optimal Cutoff for Persistent Cases
The cutoff maximizing the sum of sensitivity and specificity of the SDQP4-16 was found at a score of ≥10 (Table 4). Increasing the cutoff by 1 would lead to a considerable decrease in sensitivity, whereas further reducing the cutoff would imply a decrease in specificity without detecting more persistent cases. The scale ruled in (identified true cases as reflected by the sensitivity) 80% of children with persistent diagnoses. However, it was considerably less sensitive to transient cases; at a cutoff of 10, half of the children with concurrent diagnosis and only one-third of the children with prospective diagnosis were detected.
Screening Efficiency for Varying Prevalences
Table 4 shows the screening efficiency of the SDQP4-16 with respect to sensitivity, specificity, and the PPV and NPV for persistent diagnoses at a cutoff of 10. In populations with a 10% rate of persistent disorders, the SDQP4-16 would obtain a PPV of 39.1%, further increasing to 50.5% at a prevalence of 15%. The probability of being a true positive when screening positive (PPV) increases with increasing prevalence. Considerable increases in prevalence only produce minor reductions in NPV.
The current study examined screening efficiency for persistent psychiatric disorders. The SDQP4-16’s discriminative capacity was twofold: being good (AUC = 0.85) at discriminating persistent disorders from children not presenting a persistent pattern but modest (AUC = 0.71) at discriminating persistent disorders from transient cases. At the selected cutoff, most persistent cases are screen-positives (sensitivity = 0.79), whereas most nonpersistent cases are screen-negatives (specificity = 0.86). However, at the low observed frequency of disorder (<2%) false-positives constitute a proportionally larger portion of the screen-positives than true-positive cases, yielding a low PPV. At higher rates of persistent disorders, screen-positives would include more true-positives and proportionally fewer false-positives (increased PPV).
In this study, the probability (AUC = 0.85) that a randomly selected child with a persistent disorder would have a higher SDQ score than a randomly selected child without a persistent disorder outperforms that obtained for transient cases (AUC = 0.74 and 0.68 for concurrent and prospective cases, respectively). Moreover, it exceeds that obtained in studies in preschool- and school-aged children not considering diagnostic status at follow-up.15,38 However, the SDQ’s capacity to discriminate persistent cases from children diagnosed at 1 assessment (concurrent and prospective cases) was lower (AUC = 0.71). Screens that could extract persistent cases at an early stage could potentially improve our ability to intervene effectively to prevent continuity of these disorders. However, although symptom counts on checklists such as the SDQ seem to be suitable for differentiating persistent pathology from nonpathology, this method may not be sufficient to distinguish persistent pathologic behavior from transient pathologic behavior.
At the selected cutoff of 10, the estimated specificity (86%) and the NPV (99.6%) were high, meaning that the SDQ largely ruled out children who did not show a persistent pattern of disorder. The higher sensitivity for persistent cases (79%) relative to transient cases (50% and 33% for concurrent and prospective cases, respectively) indicates that far fewer persistent cases were missed by the SDQ. However, the accompanying false-positive rate was high. This latter finding is consistent with previous findings; when screening in community samples, the proportion of true-negatives (NPV) is high, but the proportion of true-positives (PPV) is substantially lower, and hence highly related to the prevalence.15,39,40 Consequently, increasing the cutoff to 11 in the present sample scarcely affects the PPV (increases from 9.5 to 10.7) but yields a substantial decline in sensitivity (from 79% to 64%). Thus, minimizing the false-negative rate for persistent cases was our primary guidance when selecting the cutoff. In populations with higher rates of persistent disorders, a substantially higher rate of true-positives (increased PPV) would be detected, but a somewhat larger proportion would be false-negatives (decreased NPV) (see Table 4).
A positive screen indicates risk of a disorder, which requires subsequent assessments to reveal the potential presence of psychopathology and possible need of intervention. These subsequent assessments are essential to avoid imposing unnecessary and potentially demanding and risky interventions (eg, medication) on someone who may not need it. In the current study, the estimated false-positive rate includes transient disorders. For these concurrent and prospective cases, a positive screen may be an opportunity for intervention to relieve stress and impairment or for preventive measures before problems become more serious.10 Moreover, because psychopathology is dimensional in nature, a screen-positive noncase may still experience problems that are possibly impairing, even if he or she does not necessarily warrant a clinical diagnosis. However, screening may cause stress and worry among parents whose children are falsely screened positive and lead to labeling of children who would have been better off unlabeled. Moreover, subsequent assessments of screen-positives would consume considerable resources. When false-positive rates are high, a substantial share of the resources would not reach those who need it most. A more finely grained screen covering a broader range of childhood psychopathology than offered by the SDQ (eg, the Achenbach System of Empirically Based Assessment41; 99 items) may offer a better initial differentiation between true-and false-positives. Increasing the number of items would, however, run counter to brevity, a key characteristic of a screen that is suitable in primary care. Moreover, in community populations, the proportion of true-positives is lower and milder symptomatology predominates relative to clinical populations.42–45 Under these circumstances, it is more challenging to extract children suffering from psychopathology that requires intervention.
Stable prevalences from preschool to school age in the present sample coincide with previous findings from the United States13; however, the rates (∼7% at both time points) were comparatively lower and in line with other Scandinavian findings.27,28,46 Whereas PPV and NPV are affected by prevalence, sensitivity and specificity (and thus AUC estimates) are reasonably stable across prevalences and populations47 and may generalize to other populations. Indications of comparable reliability and validity of the SDQ across Western countries38 support the validity of our results concerning the screening efficiency for other populations. Our results support earlier findings of heterogeneity within early-onset psychopathology; a substantial proportion of children meeting criteria for a diagnosis at age 4 did not meet criteria for a diagnosis at age 6 and vice versa. The stability rate of 1.8% in the present sample reflects the low prevalence at baseline; the fewer children who have a diagnosis at baseline, the fewer children remain diagnosed 2 years later. Statistically, a baseline prevalence of 50% would yield a stability of 25% by chance alone, whereas a baseline prevalence of 25% would indicate a stability of ∼6% by chance. The fact that the stable cases were seldom referred for treatment, and none when they were preschool-aged, underscores the importance of detecting these cases early on.
Some limitations of this study should be noted. First, some participants were lost to follow-up. However, the use of full-information maximum likelihood and MI, should have minimized the likelihood of inaccurate estimates and increased statistical power relative to the use of a complete case analysis approach. Second, our subjects were mostly of Norwegian origin; the findings may thus not generalize to more ethnically diverse populations. Third, parent-reported SDQ scores were compared with the PAPA interview, which was also derived from parental information. Although the PAPA interview is clearly interviewer-based, comparative information (eg, clinician rating, information from teachers) would minimize potential biases associated with a single informant. Fourth, the 3-month primary period may have limited the identification of cases with onset and remission occurring before this period or between assessments. Fifth, the CI in the sensitivity estimation was large. Replication in samples with different stability rates and environments is needed to support our findings.
The current study suggests that the SDQ may assist clinicians identifying persistent cases; 80% of preschool-aged children with a disorder at age 4 that continues to age 6 were detected. However, a large proportion of screen-positives are nonpersistent cases, and subsequent and more detailed assessments to distinguish those that require swift intervention from those with different or no interventional needs are essential. Primary care could benefit from screening tools that increase the targeted and efficient use of resources. The present findings raise questions regarding the usefulness of the SDQ in primary care screening of persistent psychiatric disorders in young children, particularly at low rates of disorder.
- Accepted July 7, 2016.
- Address correspondence to Trude Hamre Sveen, PsyD, Department of Psychology, NTNU, N-7491 Trondheim, Norway. E-mail:
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: Supported by grants 228685/H10 and 185760/V50 from the Research Council of Norway and grant 4396 from the Liaison Committee between the Central Norway Regional Health Authority and the Norwegian University of Science and Technology.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
- Earls F
- Lavigne JV,
- Gibbons RD,
- Christoffel KK, et al
- Lavigne JV,
- Binns HJ,
- Christoffel KK, et al; Pediatric Practice Research Group
- Weitzman C,
- Wegner L; Section on Developmental and Behavioral Pediatrics; Committee on Psychosocial Aspects of Child and Family Health; Council on Early Childhood; Society for Developmental and Behavioral Pediatrics
- Wakschlag LS,
- Briggs-Gowan MJ,
- Choi SW, et al.
- Dehon C,
- Scheeringa MS
- ↵American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. 4th ed. Washington, DC: American Psychiatric Association; 1994
- Buuren SV
- Muthén LK,
- Muthén BO
- Hosmer DW,
- Lemeshow S
- Newson R.
- Biederman J,
- Monuteaux MC,
- Kendrick E,
- Klein KL,
- Faraone SV
- Achenbach TMRL
- Fombonne E
- Kessler RC,
- Chiu WT,
- Demler O,
- Merikangas KR,
- Walters EE
- Demyttenaere K,
- Bruffaerts R,
- Posada-Villa J, et al; WHO World Mental Health Survey Consortium
- Rescorla L,
- Achenbach T,
- Ivanova MY, et al
- Zhou XH,
- Obuchowski NA,
- McClish DK
- Copyright © 2016 by the American Academy of Pediatrics