OBJECTIVE: The purpose of this study was to examine the performance characteristics and validity of the Patient Health Questionnaire-9 Item (PHQ-9) as a screening tool for depression among adolescents.
METHODS: The PHQ-9 was completed by 442 youth (aged 13–17 years) who were enrolled in a large health care–delivery system and participated in a study on depression outcomes. Criterion validity and performance characteristics were assessed against an independent structured mental health interview (the Child Diagnostic Interview Schedule [DISC-IV]). Construct validity was tested by examining associations between the PHQ-9 and a self-report measure of functional impairment, as well as parental reports of child psychosocial impairment and internalizing symptoms.
RESULTS: A PHQ-9 score of 11 or more had a sensitivity of 89.5% and a specificity of 77.5% for detecting youth who met the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition criteria for major depression on the DISC-IV. Receiver-operator-curve analysis revealed that the PHQ-9 had an area under the curve of 0.88 (95% confidence interval: 0.82–0.94), and the cut point of 11 was optimal for maximizing sensitivity without loss of specificity. Increasing PHQ-9 scores were significantly correlated with increasing levels of functional impairment, as well as parental report of internalizing symptoms and psychosocial problems.
CONCLUSIONS: Although the optimal cut point is higher among adolescents, the sensitivity and specificity of the PHQ-9 are similar to those of adult populations. The brief nature and ease of scoring of this instrument make this tool an excellent choice for providers and researchers seeking to implement depression screening in primary care settings.
WHAT'S KNOWN ON THIS SUBJECT:
Depression is common among adolescents. In response to the growing evidence for effective treatments for depression among adolescents, the US Preventive Services Task Force now recommends screening for depression among adolescents in primary care settings. However, there is limited information on the performance of depression-screening tools with adolescents in primary care settings.
WHAT THIS STUDY ADDS:
This study is the first to examine the sensitivity and specificity of the PHQ-9 among adolescent populations. The PHQ-9 has good sensitivity and specificity for detecting major depression among adolescents in the primary care setting.
In response to the growing evidence for effective treatments for depression among adolescents, the US Preventive Services Task Force now recommends screening for depression among adolescents in primary care settings.1 However, a recent systematic review2 identified only 5 studies with adequate psychometric data on the sensitivity and specificity of depression-screening instruments in primary care. Two of the studies evaluated the same instrument; thus, 4 screening instruments were tested. The 4 instruments that were identified each have potential limitations for application in primary care, including the cost to administer a patented instrument,3 length of the screening instrument,4,5 algorithm-based scoring rather than symptom scoring,6 and the ability to be a stand-alone screener for depression.7
Ideal screening instruments are brief, easy to understand for patients, simple to score, available without cost, and have strong performance characteristics. The Patient Health Questionnaire-9 Item (PHQ-9) depression screener was developed for administration among adults in primary care settings. It has been shown to have good diagnostic validity and comparable sensitivity and specificity to other longer measures of depression.8,9 In addition, because the instrument is based on Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) criteria, the same 9 items are used for adults to establish probable depressive-disorder diagnoses as well as to grade depressive-symptom severity.
Despite its wide use in adult populations, the PHQ-9 has not been validated in adolescent populations. In this study, we evaluated the operating characteristics (sensitivity and specificity) of the PHQ-9 as a diagnostic instrument for depressive disorders among adolescents and the construct validity of the PHQ-9 as a depression-severity measure in relation to functional status and parental assessment of child symptoms.
The Adolescent Health Study was developed by a multidisciplinary team at the University of Washington and the Group Health Research Institute. The main purposes of the Adolescent Health Study were to evaluate the performance of depression-screening tools and to describe the characteristics of adolescents who would most benefit from exposure to evidence-based interventions for depression in primary care settings. All procedures were approved by the Group Health Research Institute Human Subjects Protection Committee.
Between September 2007 and June 2008, study staff randomly selected 4000 enrollees, aged 13 to 17, who had seen a provider in a Group Health Research Institute health care facility at least once in the previous 12 months. The Group Health Research Institute is a consumer-governed nonprofit health care organization that serves more than 630 000 residents of Washington and Idaho. The parents/guardians of selected enrollees received an invitation letter, a consent form, and a brief survey for their child. Parents were asked to sign the consent form and give the survey to their child to complete in a private place. The child received a $2 preincentive. Completion of the survey was taken as a form of assent by the child. Parents of youth who did not respond received a second mailing and follow-up telephone calls. The brief survey consisted of 10 items about age, gender, weight, height, sedentary behaviors, overall heath, functional impairment, and depressive symptoms (the Patient Health Questionnaire-2 Item [PHQ-2] depression scale). The PHQ-2 includes the first 2 items from the PHQ-9 and asks respondents to rate the frequency (0, not at all; 3, nearly every day) with which they have experienced (1) a depressed mood and/or (2) lack of pleasure in usual activities in the previous 2 weeks. In a previous publication in which these same data were used, we found that a score of 3 or higher on the PHQ-2 has a sensitivity of 73% and a specificity of 75% for detecting major depression among adolescents.10
A subset of youth (n = 499) were invited to participate in the follow-up telephone interview study, during which in-depth information was obtained on depressive symptoms, functional impairment, and health behaviors. Youth with a PHQ-2 score of 3 or higher (n = 271) and a sample of youth with a PHQ-2 score of 2 or less (n = 228), who were frequency matched for age and gender, were invited to participate. Youth who completed the follow-up interview received $20. Consent for the telephone survey was obtained from both the parent and the child.
The child telephone interview included the PHQ-9 screener and the Diagnostic Interview Schedule for Children (DISC-IV) depression modules. The PHQ-9 was completed before other depression and mental health measures.
The PHQ-9 is a self-administered version of the depression portion of the Primary Care Evaluation of Mental Disorders,11 which uses DSM-IV criteria to assess for mental disorders in primary care.8 It can be scored to provide a dichotomous diagnosis of probable major depression and to grade symptom severity via a continuous score. The PHQ-9 has been found to have high sensitivity (73%) and high specificity (98%) for the diagnosis of major depression in adult populations.8,11 Among adults, scores on the PHQ-9 also have been used to define severity for probable diagnoses in the following manner: a score of 5 to 9 indicates minimal depression, 10 to 14 indicates mild major depression, 15 to 19 indicates moderate major depression, and 20 or more indicates severe major depression.8 The PHQ-9 also has a functional impairment question (item 10) that asks how much the symptoms they endorse in the first 9 items interfere with daily functioning.
The DISC-IV is a reliable and valid structured interview designed for lay interviewers, that includes algorithms to diagnose DSM-IV disorders in children and adolescents.12 Telephone versions of structured psychiatric interviews have been found to have a high correlation with in-person interviews.13,14 To decrease patient burden, only the depression modules (major depression and dysthymia) were used. All interviewers received 12 hours of classroom and hands-on training and additional project-specific training on the DISC-IV.
The Columbia Impairment Scale (CIS) was used to assess functional impairment.15 The 13-item CIS measures adolescent impairment in school, family, and peer relationships and has been shown to correlate with the clinician-rated Children's Global Assessment Scale.15
To assess for anxiety symptoms, youth were asked to complete the brief 5-item version of the Screen for Child Anxiety Related Emotional Disorders (SCARED).16 Using a cut off of 3 or greater, the brief SCARED has been shown to have a sensitivity of 74% and a specificity of 73% for determining which youth have clinically significant anxiety and which youth do not have anxiety.16
Parents were asked to complete the Brief Pediatric Symptom Checklist (PSC-17). The internalizing component of the PSC-17 (at a cut point of ≥5) has a sensitivity of 73% and specificity of 74% for detecting youth with a depressive disorder.17 The externalizing component (at a cut point of ≥7) has a sensitivity of 62% and a specificity of 89% for detecting youth who meet criteria for an externalizing disorder.17
Descriptive statistics were completed for the full sample and stratified according to depression status. Three categories of depression status were used on the basis of algorithms from the DISC-IV: major depression; intermediate depression; or no depressive disorder.12 Youth with intermediate depression reported at least 3 of 9 symptom criteria for major depression but did not meet diagnostic criteria for major depression in the previous year. χ2 and F-test analyses were used to compare categorical and continuous variables, respectively, among depressed and nondepressed individuals on the basis of their PHQ-9 score. The area under the receiver operator curve (ROC) was calculated as a quantification of the sensitivity and specificity of the ability of the self-report questionnaires to classify youth into the past-month major-depression category on the basis of the DISC-IV. Results were interpreted on the basis of standards that have been set for interpreting the area under the curve.18
Of 3775 eligible youth, 2291 (60.7%) completed the brief survey (Fig 1). Twelve percent of the youth (n = 281) screened positive for possible depression with a PHQ-2 score of 3 or higher. A total of 499 youth were invited to participate in the full baseline assessment, of whom 444 (89%) consented and both the parent and child completed the baseline survey. Two youth who met DSM-IV criteria for bereavement on the DISC-IV were removed from the analytic sample, which resulted in a final sample of 442 youth.
The mean age of the participants was 15.3 years (SD: 1.2), and 60% of the subjects were female. The sample was predominantly white (71%); the next largest minority groups were Asian (10%) and black (9.6%). Seventy-eight percent of the youth came from 2-parent households, and 87% of the youth came from households in which at least 1 parent had at least some college. The median neighborhood household income for participants was $57 442 (SD: $18 293) annually. Among 242 youth who had a positive PHQ-2 on initial screening, 112 still were positive on a PHQ-2 2 weeks later and 101 had a PHQ-9 score of 11 or higher. Among 202 whose screens were negative on the screening PHQ-2, 194 still were negative on the PHQ-2 2 weeks later and 190 had a PHQ-9 score of less than 11.
Table 1 lists the distribution of PHQ-9 scores according to depression status on the DISC-IV. Categories on the PHQ-9 (minimal, mild, moderate, moderately severe, and severe) were based on severity thresholds established by the original authors of the PHQ-9,8 with the exception of the use of 11 rather than 10 as the lower threshold for the moderate category based on the results of our ROC analyses. Youth who met criteria for major depression had significantly higher total scores on the PHQ-9 and were more likely to be in the moderate-to-severe categories of impairment on the PHQ-9 compared with the other 2 diagnostic groups (χ82 = 105.06; P < .0001). Youth with intermediate depression on the DISC-IV were most likely to be in the mild-to-moderate categories of impairment on the PHQ-9. Youth with no depression diagnosis were most likely to report minimal symptoms. The mean PHQ-9 scores also decreased significantly and linearly from a high of 15.5 (SD: 5.6) for those with major depressive disorder, 10.7 (SD: 3.9) for those with intermediate depression, and 6.1 (SD = 5.1) for youth with no depressive disorder (F2,439 = 48.08; P < .0001).
Table 2 lists the test characteristics of the PHQ-9 using the DISC-IV as a gold standard. The optimal cut point for maximizing the sensitivity of the PHQ-9 without loss of specificity was a score of 11 or greater. At this cut point, the PHQ-9 had a sensitivity of 89.5% and a specificity of 77.5% for detecting youth with major depression on the DISC-IV. The positive predictive value was 15.2% for detecting major depression on the DISC-IV, and the negative predictive value was 99.4%. On ROC analysis (Fig 2) the area under the curve for detecting major depression was 0.88 (95% CI: 0.82–0.94).
We also assessed the PHQ-9 in our sample by using the algorithmic scoring protocol for probable major depression (presence of depression and/or anhedonia at least “more than half the days” and a minimum of 5 total symptoms occurring at least “more than half the days” [with the exception of suicide, which is positive with any endorsement]). With this scoring protocol, the sensitivity for detecting youth with major depression on the DISC-IV was 57.9% and the specificity was 90.3%.
Table 3 shows the relationship between PHQ-9 scores and each of our measures of impairment. Scores on the CIS, mean depressive symptom–related difficulty (item 10 on the PHQ-9), parental report of internalizing symptoms (PSC-17 internalizing scale), and overall psychosocial impairment (total PSC-17 score on the parent version) increased in a linear fashion such that youth with higher PHQ-9 scores also exhibited higher scores (indicating more impairment) on each of these measures (P < .0001 for all measures).
The false-positive rate, calculated as 1-specificity, was ∼22.5% when using the DISC-IV as a gold standard. To better understand the characteristics of youth with false-positive results, we examined the association between having a false-positive PHQ-9 and having intermediate depression or a positive screening test for another disorder (anxiety or externalizing disorder). Among adolescents with PHQ-9 scores of 11 or higher but no DISC-IV diagnosis for major depression (n = 95), 29.3% had intermediate depression on the DISC-IV, 16.5% had major depression in the past year but not in the previous month, 23.9% had elevated externalizing disorder symptoms (PSC-17 externalizing scale score ≥ 7), and 56.8% had clinically significant anxiety symptoms (SCARED score ≥ 3). Taken together, 82.2% of the false-positive group had at least 1 of the 4 indications we examined: 45.3% had 1; 31.6% had 2; and 5.3% had 3 of the 4 indications.
The US Preventive Services Task Force has advised primary clinicians to screen adolescents for depression provided there is a system of care to confirm diagnosis and initiate treatment.1 To implement this recommendation, providers need screening tools that can be easily implemented in pediatric primary care settings. Although it has been extensively tested among adults, this is the first study (to our knowledge) to examine the test characteristics of the PHQ-9 in an adolescent population. We found that when compared with a structured diagnostic interview, the PHQ-9 had high sensitivity (89.5%) and good specificity (78.8%) for detecting major depression among adolescents and, on ROC analysis, had an area under the curve of 0.88, which puts this screening tool in a good range.18 This sensitivity and specificity of the PHQ-9 is in a similar range of other depression-screening tools that have been tested among adolescents in primary care (the Beck Depression Inventory [sensitivity: 91%; specificity: 91%],3 the PHQ-A [sensitivity: 73%; specificity: 94%],6 and the Short Moods and Feelings Questionnaire19 [sensitivity: 80%, specificity: 81%])20 and performs better than physician interview after targeted training (sensitivity: 43%; specificity: 87%).5
In adult samples, a PHQ-9 score of 10 or higher is recommended to identify individuals with likely depression. On the basis of our findings, we would recommend using a cut point of 11 or higher to indicate the need for further evaluation for depression. However, providers may reasonably choose an alternative cut point. For example, although it may result in a higher rate of false-positive results, clinics where both adolescents and adults are seen might choose to use a cut off of 10 to simplify procedures for providers.
In a previous study that used this sample, we found that the PHQ-2, which contains the first 2 items of the PHQ-9, has a sensitivity of 74% and a specificity of 75% for detecting major depression among adolescents.10 Clinics wanting to minimize respondent burden could start with the PHQ-2 followed by the full PHQ-9 only for those with a score of 3 or higher on the PHQ-2. The benefit of adding the PHQ-9 in this protocol is that it provides more information on individual depressive symptoms, has better specificity for major depression than the PHQ-2, and includes a question about suicide, an important cause of mortality among adolescents.21 It is important to note, however, in using the PHQ-9 that youth do not need to be depressed to be suicidal. Any positive indication of suicidality (a score of 1 or higher on item 9 of the PHQ-9) should be taken seriously and followed up on by providers regardless of total PHQ-9 score.
Compared with the findings in adults, the sensitivity of the PHQ-9 is higher but the specificity is lower in the adolescent population, which suggests that when used as a screening tool, the PHQ-9 is less likely to miss youth with major depression, but there is a higher false-positive rate in adolescent populations. The higher false-positive rate in adolescent populations may be a result of a high rate of subthreshold depressive symptoms and adjustment disorders, as well as a significant overlap of symptoms between mental health disorders among this age group. Of the youth who were in the false-positive category, 82% had an indication of a mental health concern including meeting criteria for intermediate depression on the DISC-IV, having depression in the previous year but not in the previous month, having high levels of externalizing behavior, and/or having high levels of anxiety symptoms, which suggest the need for further monitoring.
An additional difference between the adult and the DSM-IV criteria for major depressive disorder is that youth may meet the diagnostic criteria by presenting with irritability rather than depressed mood. The PHQ-9 does not include an item about irritability and, to allow for the use of a single form for settings in which both adolescents and adults are seen, we chose not to change the wording of the PHQ-9. Because we did not add an irritability item, we are not able to determine how it may have modified the performance of the PHQ-9. The DISC-IV does include an irritability item, and some of the discrepancy between these 2 instruments may relate to this difference.
This study had limitations. First, this study was conducted on an insured population of adolescents in the Pacific Northwest and may not be generalizable to all adolescent populations. Second, the response rate to our initial brief screen was 60%, and we may have had some selection bias regarding youth who participated in the study. Although we were very encouraged by the 89% participation rate in the follow-up interview study, it is possible that youth who chose not to participate were different from those who did. Third, because we oversampled youth with elevated PHQ-2 scores, the prevalence of depression in our study sample may be higher than would be seen if conducting the screening in a primary care clinic. The positive and negative predictive values are influenced by underlying population prevalence and may be lower in a general primary care sample. In addition, the PHQ-9 was administered via a telephone interview, which may have resulted in different responses than if it had been self-administered. Finally, the DISC-IV asks questions about a 1-month time period, whereas the PHQ-9 asks about the previous 2 weeks. Some of the lack of sensitivity and specificity may be attributed to these time-window differences.
Despite these limitations, the PHQ-9 is a promising screening tool for use among adolescents. It is brief, easy for patients to understand, simple to score, and available without cost. An additional major advantage of the PHQ-9 is that many primary care providers already are using it for the adult population and, thus, have familiarity with administration and scoring. It performs well in this age group and will be particularly useful for providers or researchers who want to conduct rapid screening in primary care settings or as part of research protocols.
This work was supported by grants from the Group Health Community Foundation Child and Adolescent Grant Program, the University of Washington Royalty Research Fund, and a Seattle Children's Hospital Steering Committee Award and through a K23 award for Dr Richardson from the National Institute of Mental Health (5K23 MH069814-01A1).
- Accepted August 26, 2010.
- Address correspondence to Laura P. Richardson, MD, MPH, Center for Child Health, Behavior, and Development, Seattle Children's Hospital Research Institute, 1100 Olive Way, Suite 500, M/S MPW 8-1, Seattle, WA 98101. E-mail:
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
- PHQ-9 =
- Patient Health Questionnaire-9 Item •
- DSM-IV =
- Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition •
- PHQ-2 =
- Patient Health Questionnaire-2 Item •
- DISC-IV =
- Child Diagnostic Interview Schedule •
- CIS =
- Columbia Impairment Scale •
- SCARED =
- Screen for Child Anxiety Related Emotional Disorders •
- PSC-17 =
- Brief Pediatric Symptom Checklist •
- ROC =
- receiver operator curve
- 1.↵US Preventive Services Task Force. Screening and treatment for major depressive disorder in children and adolescents: US Preventive Services Task Force Recommendation Statement. Pediatrics. 2009;(4):123–1223–1228
- Zuckerbrot RA,
- Maxon L,
- Pagar D,
- Davies M,
- Fisher PW,
- Shaffer D
- Richardson LP,
- Rockhill C,
- Russo JE,
- et al
- Shaffer D,
- Fisher P,
- Lucas CP,
- Dulcan MK,
- Schwab-Stone ME
- Tape T
- Hamilton BE,
- Minino AM,
- Martin JA,
- Kochanek KD,
- Strobino DM,
- Guyer B
One Gene or Many?: Is it a mutation in one gene or many that leads to a change in a species? That question has bedeviled scientists for years. Now there is data to suggest that at least in fruit flies, it is not a single favorable genomic mutation that leads to transformation in a population but a collection of smaller changes in the genome. According to an article in The New York Times (Wade N, September 20, 2010) researchers at the University of California, Irvine examined the genome of fruit flies. With each generation, the first flies to hatch were selected to breed the next. After 600 generations, the time it took for a fruit fly to hatch was approximately 20% shorter. The whole genome of 250 fruit flies was sequenced to see what had led to the effect. No single gene was responsible for the early hatching. Rather, the frequency of many genes that controlled earlier maturation became more common in the population. This is one of the first times that scientists had proven that natural selection in a complex, multicellular organism, can work through changing gene frequencies. While single mutation changes often have profound effects in bacteria and can occur in people, changes in gene frequency are probably a greater driver of recent evolutionary change in humans. The findings could have implications for drug development. Developing effective drugs for a disease controlled by many genes is likely to be much more challenging than a disease controlled by only one.
Noted by WVR, MD
- Copyright © 2010 by the American Academy of Pediatrics