OBJECTIVE: The aim of this study was to determine the reliability and validity of a Chinese version of the Patient Health Questionnaire–9 item (PHQ-9) and its 2 subscales (1 item and 2 items) for the screening of major depressive disorder (MDD) among adolescents in Taiwan.
METHODS: A total of 2257 adolescents were recruited from high schools in Taipei. The participants completed assessments including demographic information, the Chinese version of the PHQ-9, and the Rosenberg Self-Esteem Scale, and data on the number of physical illnesses and mental health service utilizations were recorded. Among them, 430 were retested using the PHQ-9 within 2 weeks. Child psychiatrists interviewed a subsample of the adolescents (n = 165) using the Kiddie-Schedule for Affective Disorder and Schizophrenia Epidemiological Version as the criterion standard.
RESULTS: The PHQ-9 had good internal consistency (α = 0.84) and acceptable test–retest reliability (0.80). The participants with higher PHQ-9 scores were more likely to have MDD. Principal component factor analysis of the PHQ-9 yielded a 1-factor structure, which accounted for 45.3% of the variance. A PHQ-9 score ≥15 had a sensitivity of 0.72 and a specificity of 0.95 for recognizing MDD. The area under the receiver operating characteristic curve was 0.90. The screening accuracy of the 2 subscales was also satisfactory, with a Patient Health Questionnaire–2 item cutoff of ≥3 being 94.4% sensitive and 82.5% specific and a Patient Health Questionnaire–1 item cutoff of ≥2 being 61.1% sensitive and 87.7% specific.
CONCLUSIONS: The PHQ-9 and its 2 subscales appear to be reliable and valid for detecting MDD among ethnic Chinese adolescents in Taiwan.
- AUC —
- area under the receiver operating characteristic curve
- K-SADS-E —
- Kiddie-Schedule for Affective Disorder and Schizophrenia Epidemiological Version
- MDD —
- major depressive disorder
- PHQ-1 —
- Patient Health Questionnaire–1 item
- PHQ-2 —
- Patient Health Questionnaire–2 item
- PHQ-9 —
- Patient Health Questionnaire–9 item
- ROC —
- receiver operating characteristic
- RSES —
- Rosenberg Self-Esteem Scale
What’s Known on This Subject:
Major depression is common among adolescents. The PHQ-9 has good sensitivity and specificity for detecting depression among adolescents in primary care settings. However, no study has examined the psychometric properties of the PHQ-9 among Chinese adolescents in school settings.
What This Study Adds:
This is the first study to validate the use of the PHQ-9, Patient Health Questionnaire–2 item, and Patient Health Questionnaire–1 item among Chinese adolescents in Taiwan. The PHQ-9 and its 2 subscales have good sensitivity and specificity for detecting depression among school adolescents.
Major depressive disorder (MDD) is a mental disorder emerging in adolescence, with reported prevalence rates ranging from 0.5% to 4.4%.1–5 MDD increases the risk of suicide, substance abuse, poor academic performance, and poor social function in adolescents.6,7 Although MDD leads to negative outcomes, fewer than half of adolescents with MDD receive mental health services.2,4,8 Furthermore, the early onset of mental disorders predicts a longer delay in the initiation of treatment.9 The delay in diagnosis and treatment, inadequate supply of mental health services, and the adverse consequences of MDD reinforce the importance of screening and treatment of MDD in adolescents. Several validated depression questionnaires have been developed in Western countries with the aim of screening for depression among adolescents, including the Center for Epidemiologic Studies Depression Scale,10,11 Beck Depression Inventory,12 Adolescent Depression Rating Scale,13 Children’s Depression Inventory,14 Patient Health Questionnaire for Adolescents,15 Short Moods and Feelings Questionnaire,16 and Hospital Anxiety Depression Scale.17 Some of these self-completed measures have also been validated in Taiwanese and ethnic Chinese adolescents.18–20 However, these questionnaires often contain many items and can be time-consuming. The Patient Health Questionnaire–9 item (PHQ-9) is shorter, and it is also designed to diagnose and assess the severity of MDD.21 In addition, it can monitor changes in depression symptoms after intervention.22 The PHQ-9 is widely used in adult patients in primary care,23–26 older adults with chronic illnesses,27 and university students.28 In Taiwan, the PHQ-9 and its 2 subscales (2 items, PHQ-2 and 1 item, PHQ-1) have been validated among adults in primary care and demonstrated good psychometric properties.29 To date, 3 studies have investigated the psychometric properties of the PHQ-9 and PHQ-2 in adolescents, including 1 study validating the PHQ-230 and PHQ-931 in US adolescents, 1 study on the PHQ-2 and PHQ-9 in German adolescents,32 and 1 study on the PHQ-9 in Indian adolescents.33 However, the findings of these studies may not be generalizable to ethnic Chinese adolescents because the adolescents were enrolled from primary care settings and because of cultural differences. Therefore, the aim of this study was to establish the psychometric properties of the PHQ-9 and its 2 subscales among ethnic Chinese adolescents in Taiwan.
The Research Ethics Committee of Mackay Memory Hospital approved this study before implementation. The study was undertaken at 14 senior high schools in Taipei and New Taipei City from October 2009 to March 2011. A total of 3105 students were invited to join this study. After the aims of this study were fully explained to both the participants and their parents, written informed consent was obtained from both. All of the participants completed the Chinese version of the PHQ-9 and the Rosenberg Self-Esteem Scale (RSES)34,35 on computers at the participating schools each year for 2 consecutive years. They also reported occurrences of self-harm, physical illnesses, and experiences of receiving mental health services in the previous year. The validation study was part of a prospective study to evaluate the lifetime prevalence and 1-year incidence of self-harm behavior by using a case–control design. After the participants had completed the questionnaires in the second year, subsets of adolescents were invited to receive a clinical diagnosis by a board-certified child psychiatrist. All of the students with new occurrences of self-harm in the past year (ie, the students who did not report self-harm at entry but did the next year) were enrolled for face-to-face interviews (Fig 1). Controls were randomly selected among the students who did not report any occurrences of self-harm on a 1:1 ratio, frequency matched by class and gender. For the students who reported occurrences of self-harm in both years, 1 in 2 received face-to-face interviews. The board-certified child psychiatrists were blinded to the results of the PHQ-9, and they conducted the diagnostic interviews by using the Chinese version of the Kiddie-Schedule for Affective Disorder and Schizophrenia Epidemiological Version (K-SADS-E).5,36–38 To assess the test–retest reliability, a subsample of the participants (n = 430) were invited to complete the PHQ-9 again within 2 weeks. This subsample included classes from 2 schools with which a mutually agreeable time for the retest could be scheduled.
The PHQ-9 consists of 9 items evaluating the presence of 1 of the 9 Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition criteria of a depressive episode in the past 2 weeks. Each item of the PHQ-9 requires a response on a 4-point scale, ranging from 0 (never) to 3 (nearly every day), for a total score ranging from 0 to 27, with higher scores indicating a greater likelihood of MDD.
The RSES consists of 10 items that refer to self-respect, rated on a 4-point Likert-type scale, ranging from 1 (completely disagree) to 4 (completely agree).34 The Chinese version of the RSES has been reported to demonstrate acceptable internal consistency (α = 0.834) and test–rest reliability (0.829) in Taiwanese children.35
The K-SADS-E is a semistructured interview scale for the systematic assessment of both past and current episodes of psychiatric disorders in children and adolescents. Development of the Chinese version of the K-SADS-E was completed by the Child Psychiatry Research Group in Taiwan.36 It has been shown to be a reliable and valid instrument, and it has been used extensively in a variety of studies on childhood mental disorders in Taiwan.5,36–38
Physical and Mental Health Condition
The participants also answered a series of questions about the number of physical illnesses and their experiences of self-harm, substance use, and receiving mental health services in the previous year.
Descriptive and analytic statistics of the data obtained in this study were computed using SPSS (version 15.0; IBM SPSS Statistics, IBM Corporation) for Windows. We established inter-item reliability between the items of the Chinese version of the PHQ-9 by calculating the Cronbach’s α coefficient for the 9-item scale. Test–retest reliability within a 2-week interval was examined by intraclass correlation coefficients. To verify concurrent validity, we assessed the correlation between the total scores of the 3 PHQ scales and the RSES scores, number of physical illnesses, and mental health service use. Factor analysis was conducted to evaluate the construct validity. Criterion validity was used to test the performance of the PHQ-9 and its 2 subscales in comparison with a criterion standard. The Chinese version of the K-SADS-E was used as the criterion standard for the diagnosis of MDD. Receiver operating characteristic (ROC) curves were generated to detect the ability of the screening instruments, namely the PHQ-9 and its 2 subscales. Sensitivity and specificity were calculated. The optimal cutoff point for each instrument was determined by ROC analysis with the highest sum of sensitivity and specificity, and the optimum cutoff point was defined as the closest point on the ROC curve to the point (0, 1). All tests were 2-tailed, and the level of significance was set at P < .05.
A total of 2257 participants were recruited, with a participation rate of 72.7% (Fig 1). The mean age of the participants was 16.9 (SD, 0.6) years, and the majority were female (59.6%). The mean PHQ-9 score was 5.8 (SD, 4.7). The Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition diagnoses of the 165 participants who completed the K-SADS-E diagnostic interviews by board-certified child psychiatrists are shown in Table 1. Eighteen participants (10.9%) were diagnosed with MDD. The prevalence of MDD among adolescents who reported self-harm behavior was much higher than in those without.
The internal consistency value, established by the Cronbach’s α coefficient, was 0.84 (95% confidence interval, 0.83–0.85). The intraclass correlation coefficient for test–retest reliability of the total scores among the 430 retested subjects was 0.80 (95% confidence interval, 0.76–0.83, P < .01), indicating little variability between the 2-week time points.
Construct Validity: Factor Analysis
Principal component factor analysis using an orthogonal procedure, retaining an eigenvalue ≥1 without rotation, yielded a 1-factor structure with an eigenvalue of 4.07. This factor accounted for a total of 45.3% of the variance. All factor pattern coefficients were above 0.61, indicating that all the items were salient. Factor loadings for each item are shown in Table 2.
Concurrent Validity of the PHQ-9 and Its 2 Subscales
We assessed the concurrent validity of the PHQ-9 and its 2 subscales by correlating the PHQ scales with the RSES, number of physical illnesses, and mental health service use. We hypothesized that the RSES would have a negative correlation with the PHQ scales. As expected, the scores of the PHQ-9 and its 2 subscales were all moderately negatively associated with the RSES scores (Table 3). The mean number of physical illnesses was significantly associated with each of the 3 PHQ scales. Ninety-two (4.1%) participants received mental health services in the past year. The mean scores of the PHQ-9, PHQ-2, and PHQ-1 were all significantly different between the participants who received mental health services and those who did not.
Criterion Validity of PHQ-9 and Its 2 Subscales
The ROC and the area under the ROC curves (AUC) for the 3 scales are shown in Table 3 and Fig 2. Although all had acceptable results, the PHQ-9 was better than its 2 subscales. Table 4 lists the sensitivity and specificity for the 3 scales at different cutoff values, using the K-SADS-E as the criterion standard. The cutoff point for the PHQ-9 was 15. The sensitivity and specificity of the PHQ-1 and PHQ-2 were also satisfactory for detecting MDD, with a PHQ-2 cutoff value of ≥3 being 94.4% sensitive and 82.5% specific, and a PHQ-1 cutoff value of ≥2 being 61.1% sensitive and 87.7% specific.
This is the first study to validate the use of the PHQ-9, PHQ-2, and PHQ-1 among Chinese adolescents. The results showed that the Chinese version of the PHQ-9 and its subscales were reliable and valid in detecting depression and that the PHQ-9 had acceptable sensitivity and good specificity with a cutoff score of 15.
Consistent with previous studies in adults23–29,39 and adolescents,33 the current study also found that the Chinese version of the PHQ-9 has a high reliability, as evidenced by the internal consistency (Cronbach’s α = 0.84). The test–retest reliability of the PHQ-9 was fair but slightly lower than in previous reports on adults27–29,40 and adolescents.33 The exploratory factor analysis for the PHQ-9 yielded a 1-factor structure (depression), which is consistent with previous studies on adults.23,25,29,39,40 The participants with higher scores on the PHQ-9 and its 2 subscales reported lower self-esteem, higher mental health service use, and more physical illnesses, indicating good concurrent validity. Previous studies have also reported that an increase in the severity of depression was associated with negative self-esteem,40–42 more physical illnesses,43,44 and higher mental health service use.43–45
In the current study, the AUC for MDD was 0.90 for the PHQ-9 and 0.87 for the PHQ-2, which are comparable to those in previous studies on adolescents in primary care.30–33 The AUC for MDD was 0.81 for the PHQ-1, which was also acceptable, because an AUC of 0.80 or higher indicates that the screening tool is useful.46
The cutoff values of ≥15 for the PHQ-9 and ≥3 for the PHQ-2 are higher than the recommended cutoff scores of ≥10 for the PHQ-9 and ≥2 for the PHQ-2 in previous adult primary care studies in Taiwan.29 However, a previous study also reported a higher cutoff value of ≥11 for the PHQ-9 for adolescents in primary care,31 compared with an adult primary care population in the United States (≥10).21 The high cutoff point may be a function of the tendency of adolescents to report higher scores on the PHQ-9. Adolescents often experience affective instability, which may lead to exaggerated self-perception when they complete depression screening instruments. Comparing the PHQ score distribution of the Taiwanese adolescents in the current study with that of US adolescents31 and adults in Taiwan and other countries21,29,44 (Table 5), we found that the Taiwanese adolescents tended to score higher on the PHQ-9 than Taiwanese adults in primary care, Hong Kong adults in the community, and US adults in primary care. In addition, the mean PHQ score of the adolescents in the current study was 5.8, which is higher than the scores reported in Taiwanese adults (3.6)29 and Indian adolescents (3.9).33 This highlights the importance of validating psychological measures in different social and cultural contexts. However, a cutoff value of ≥11 for the PHQ-9 in the current study had an acceptable sensitivity (83.3%) and specificity (86.5%). Providers may choose a lower cutoff point to avoid missing adolescents with MDD, although this may result in a higher rate of false-positive results. To better understand the characteristics of the adolescents with a PHQ-9 score ≥11 but without a diagnosis of MDD, we examined their K-SADS-E diagnoses. As shown in Table 6, these adolescents had higher rates of anxiety disorders, externalizing disorders, adjustment disorders, and other depressive disorders. The presence of other mental disorders may account for the high PHQ-9 score in the current study and may explain why a cutoff point of ≥15 lowers the number of false-positive results. Although we chose a high cutoff point to lower the false-positive rate, it is worth noting that those who had a high PHQ-9 score but no MDD diagnosis had a high rate of comorbidities. This is similar to a previous study on adolescents in the United States.30 Adolescents who have elevated depressive symptoms are at risk for the later development of depression.47,48 Furthermore, we oversampled students who had reported self-harm, which may also have resulted in the high cutoff point, because the students who reported self-harm had higher rates of anxiety disorders and depression than those who did not (Table 1).
By using the K-SADS-E as the criterion standard, the PHQ-9 cutoff value of ≥15 had 72.2% sensitivity and 95.4% specificity, and the PHQ-2 cutoff value of ≥3 had 94.4% sensitivity and 82.5% specificity. Compared with the findings in previous studies on adolescents,31–33 the specificity of the PHQ-9 in the current study is higher but the sensitivity is lower, which suggests that the Chinese version of the PHQ-9 has a lower false-positive rate. Both the specificity and sensitivity of the PHQ-2 in the current study are higher compared with those in a study on adolescents in the United States (75.2% and 73.7% for specificity and sensitivity, respectively).30 Compared with the findings of a study using the PHQ-2 in Germany (79.4% and 85% for specificity and sensitivity, respectively),32 the results of the current study showed a lower sensitivity but higher specificity. The single-question screen, PHQ-1, was also found to be sensitive and specific for detecting current MDD in Taiwanese adolescents (61.1% and 87.7%, respectively). Specificity is an important consideration if depression screening becomes more widespread, because a large number of false-positive results would be difficult to handle efficiently in schools. We suggest that either the PHQ-1 or the PHQ-2 can be used as an initial screening test and applied routinely in schools to detect depression, and the PHQ-9 can be used as a confirmatory screening test for those with a higher score on the PHQ-2 (3 or higher) or PHQ-1 (2 or higher).
There are several strengths to this study. First, this is the first study to validate 3 depression screening tools (PHQ-1, PHQ-2, and PHQ-9) for an adolescent population in a nonclinical setting in a Chinese society. The results demonstrate the general validity of the PHQ as an instrument to screen for depression, which is consistent with validation studies performed on adolescents in other countries.30–33 Approximately 5% of adolescents in Taiwan have depressive disorders, and the rate is increasing.5 The lack of validation of the PHQ-9 in Chinese adolescents currently precludes its application to ethnic Chinese societies. Second, the board-certified child psychiatrists who made a diagnosis confirming MDD by using the K-SADS-E were blinded to the results of the PHQ-9. This may more accurately confirm the validity of the PHQ-9 and its 2 subscales in the current study than in other research. Third, we recruited a large sample of students from the community and not from primary care services. The results therefore may better represent a general adolescent population. Fourth, we found that physical condition and mental health service use were significantly associated with PHQ-9 level, suggesting the importance of screening adolescents who have more physical illnesses or mental health service use. Fifth, we compared the adolescents who reported self-harm with those who did not, and we found that the rate of MDD in the self-harm group was high, suggesting that it is important to screen for MDD among adolescents who report self-harm.
There are some limitations to this study. First, the participants in this study were recruited from schools in the Taipei area, and therefore the results may not be generalizable to adolescents from other regions with different ethnic and cultural backgrounds. Second, because we oversampled participants who reported self-harm behavior, bias may exist. Because we did not interview the students in the “others” group (Fig 1), we could not perform weighting. Finally, this study was a cross-sectional design. Additional longitudinal studies are needed to establish its sensitivity to change.
This study shows that the original English version of the PHQ-9 can be successfully adapted to an ethnic Chinese context, with satisfactory psychometric properties of validity and reliability among Chinese adolescents in Taiwan. Importantly, the Chinese version of the PHQ-9 was well accepted by the adolescents and school teachers because it has the advantage of being a brief instrument. Because depression is easily overlooked in adolescents, a valid and easy method for detecting depression may decrease the rates of morbidity and mortality in adolescents. The Chinese version of the PHQ-9 is an efficient instrument for early detection of MDD, which may increase the likelihood of early intervention or treatment in adolescents. However, the screening results of PHQ-9 may not guarantee a clinical diagnosis, so additional comprehensive assessment, assistance, and prompt referral should be given to those who show possible depressive disorders on the PHQ-9. With the appropriate integration of the PHQ into school services and public health services, providers may heighten their awareness of MDD among adolescents in schools and the community.
- Accepted November 6, 2013.
- Address correspondence to Shen-Ing Liu, MD, PhD, Department of Psychiatry, Mackay Memorial Hospital, No. 92, Section 2, Chung-Shan North Rd, Taipei, Taiwan. E-mail:
Dr Tsai contributed to acquisition of partial data, data analyses and interpretation, and drafting and submission of the manuscript; Dr Yu-Hsin Huang coordinated and supervised data collection, collected data, and revised the manuscript; Dr Hui-Ching Liu conceptualized and designed the study and revised the manuscript; Dr Kuo-Yang Huang collected data and reviewed the manuscript; Dr Yen-Hsun Huang collected data and revised the manuscript; Dr Shen-Ing Liu was involved in study design, acquisition of funding, and manuscript revision and carried out the final analyses; and all authors approved the final manuscript as submitted.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: Supported by grants from the National Science Council, Taiwan (NSC 9802314-B-195-011 MY3) and Mackay Memorial Hospital, Taipei, Taiwan.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
- Fendrich M,
- Weissman MM,
- Warner V
- Barkmann C,
- Erhart M,
- Schulte-Markwort M,
- BELLA Study Group
- White D,
- Leach C,
- Sims R,
- Atkinson M,
- Cottrell D
- Gensichen J,
- von Korff M,
- Peitz M,
- et al.,
- PRoMPT (Primary Care Monitoring for Depressive Patients Trial)
- Cameron IM,
- Crawford JR,
- Lawton K,
- Reid IC
- Lamers F,
- Jonkers CC,
- Bosma H,
- Penninx BW,
- Knottnerus JA,
- van Eijk JT
- Richardson LP,
- McCauley E,
- Grossman DC,
- et al
- Rosenberg M
- Lin RC
- Gau SF,
- Soong WT
- Gau SS,
- Chong MY,
- Yang P,
- Yen CF,
- Liang KY,
- Cheng AT
- Gau SS,
- Huang YS,
- Soong WT,
- et al
- Copyright © 2014 by the American Academy of Pediatrics