OBJECTIVE: The goal was to describe the accuracy of the Edinburgh Postnatal Depression Scale (EPDS), Beck Depression Inventory II (BDI-II), and Postpartum Depression Screening Scale (PDSS) in identifying major depressive disorder (MDD) or minor depressive disorder (MnDD) among low-income, urban mothers attending well-child care (WCC) visits during the postpartum year.
METHODS: Mothers (N = 198) attending WCC visits with their infants 0 to 14 months of age completed a psychiatric diagnostic interview (standard method) and 3 screening tools. The sensitivities and specificities of each screening tool were calculated in comparison with diagnoses of MDD or MDD/MnDD. Receiver operating characteristic curves were calculated and the areas under the curves for each tool were compared to assess accuracy for the entire sample (representing the postpartum year) and subsamples (representing early, middle, and late postpartum time frames). Optimal cutoff scores were calculated.
RESULTS: At some point between 2 weeks and 14 months after delivery, 56% of mothers met criteria for either MDD (37%) or MnDD (19%). When used as continuous measures, all scales performed equally well (areas under the curves of ≥0.8). With traditional cutoff scores, the measures did not perform at the expected levels of sensitivity and specificity. Optimal cutoff scores for the BDI-II (≥14 for MDD and ≥11 for MDD/MnDD) and EPDS (≥9 for MDD and ≥7 for MDD/MnDD) were lower than currently recommended. For the PDSS, the optimal cutoff score was consistent with current guidelines for MDD (≥80) but higher than recommended for MDD/MnDD (≥77).
CONCLUSIONS: Large proportions of low-income, urban mothers attending WCC visits experience MDD or MnDD during the postpartum year. The EPDS, BDI-II, and PDSS have high accuracy in identifying depression, but cutoff scores may need to be altered to identify depression more accurately among urban, low-income mothers.
WHAT'S KNOWN ON THIS SUBJECT:
Postpartum depression is common, especially among underserved women. Many studies have found it is feasible to screen mothers in pediatric clinics. The accuracy of depression screening tools for a low-income, minority population in a pediatric clinic is unknown.
WHAT THIS STUDY ADDS:
This is the first study to describe the prevalence of depression, determined with a diagnostic interview, among low-income, young, black mothers attending WCC visits and the first to describe the accuracy of 3 depression screening tools in this understudied population.
Postpartum depression affects ∼14% of new mothers in the United States,1 with higher rates among poor and minority women.2,–,4 Multiple negative effects for mothers and infants are well described.5,–,8 Efforts have focused on improving identification of postpartum depression.9,–,11 To increase the potential for early intervention, primary care providers, including pediatricians, are encouraged to screen mothers.10,–,14 However, practitioners are unsure which instruments to use and whether one is preferable.
Pediatric practitioners must have confidence that the tools accurately identify depression among the women in their diverse practices. Several studies assessed the accuracy of screening tools in identifying postpartum depression, but they had several limitations.1 Most did not include significant numbers of low-income or minority women, who have higher rates of postpartum depression. Also, most assessed the tools' accuracy in the early postpartum period. Because depression can occur at any time in the postpartum year15,16 and some providers screen mothers throughout the year,11 evaluation of the tools' accuracy at different time points is critical. Despite support for17 and the feasibility of postpartum depression screening in primary care, including pediatrics,10,11,18,–,20 few accuracy studies have been conducted in primary care settings.21,22 To address these limitations, we conducted a study designed to establish the sensitivity, specificity, and operating characteristics of 3 depression screening tools in a low-income, urban population of women attending well-child care (WCC) visits during the postpartum year.
Recruitment and Sample
Between April 1, 2003, and August 31, 2005, a convenience sample of mothers (N = 647) of infants (≤14 months of age) who were ≥18 years of age and were attending a WCC visit at the Strong Pediatric Practice at Golisano Children's Hospital were invited to complete a demographic questionnaire and the Center for Epidemiological Studies Depression Scale (CES-D)23,24 and to return for a diagnostic interview. Eight women were ineligible because of age (<18 years), language barriers, or previous participation in the study (eg, with a previous infant). Of 639 eligible women, 217 (34%) refused but provided nonidentifiable demographic information and 422 (66%) provided written informed consent and completed the demographic questionnaire and CES-D (Fig 1). Of those 422 women, 28 refused further participation, 9 were excluded, and 198 completed the psychiatric diagnostic interview (Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition [DSM-IV] [SCID]25) (Fig 1). The study was approved by the University of Rochester research subjects review board. All participants provided written informed consent.
Among the eligible women who agreed to complete the SCID, 49% (N = 187) did not. Difficulties with retention, similar to those described by other investigators,26 were recognized and addressed early27 by (1) offering an immediate SCID, (2) providing appointment cards at consent, (3) sending confirmation letters, (4) calling the subject 12 to 48 hours before the appointment, (5) rescheduling at the subject's convenience, and (6) following up at the next WCC visit. Subjects received 1 to 9 calls (mean: 4.1; SD: 2.06), had appointments rescheduled a maximum of 3 times, and received $40 for participation in the SCID.
Descriptive Group Assignments
Because of the cross-sectional study design, infant age at the time of the maternal interview was used to assign subjects to a postpartum group, that is, 2 weeks to 4 months (early), >4 to 8 months (middle), or >8 to 14 months (late). The groups were chosen for assessment of the utility of the tools throughout the year, and the group time frames coincided with ≥2 WCC visits.
The CES-D is a 20-item, self-report measure that has been used to screen for postpartum depression.23,24,28 For this study, it was used for 2 primary purposes. First, for assessment of potential bias attributable to depression, an initial depression measure that was not the focus of the accuracy study was needed to compare women who did and did not complete the SCID. Second, the distribution of women with high (≥16) and low (<16) CES-D scores was monitored because the sample size calculation for the receiver operating characteristic (ROC) curves was based on an assumption of approximately equal numbers of depressed and nondepressed subjects. In determination of the minimal group size for comparison of ROC curves across the postpartum groups, power analysis with PASS software (NCSS, LLC, Kaysville, UT), with 50% depressed subjects, indicated that a sample size of 60 was sufficient to yield enough power (80%) to detect a difference of 0.13 to 0.16, depending on the true areas under the curves (AUCs).
Demographic information included maternal race, ethnicity, age, marital status, number of children, insurance status, and education.
The tools were placed in random order in sealed envelopes, to ensure that they were not answered in a biased fashion. The Beck Depression Inventory II (BDI-II), a 21-item, self-report questionnaire that assesses cognitive, behavioral, affective, and somatic symptoms of depression, was developed to correspond to the criteria for DSM-IV depressive diagnoses.29,–,32 Suggested cutoff scores are as follows: 0 to 13, minimal depression; 14 to 19, mild depression; 20 to 28, moderate depression; 29 to 63, more-severe depression.32 The BDI-II forms were purchased for use in this study.
The Edinburgh Postnatal Depression Scale (EPDS), a 10-item, self-administered questionnaire developed for assessment of depression in postpartum women, has been validated against the Research Diagnostic Criteria for major depressive disorder (MDD) or minor depressive disorder (MnDD)33 and in a variety of settings and community samples, with the majority of studies focusing on the 6- to 8-week postpartum period.33,–,36 Scores range from 0 to 30, with a cutoff score of ≥10 being recommended for detection of MDD/MnDD with sensitivities of >90% and specificities between 77% and 88%.33,37,38 A cutoff score of ≥13 is recommended for detection of MDD with sensitivities of 85% to 100% and specificities of 80% to 95%.33,37,38 The EPDS form indicated the original reference and acknowledged the original authors, as required for its use free of charge.
The Postpartum Depression Screening Scale (PDSS) is a 35-item, self-report questionnaire that assesses 7 dimensions (sleeping/eating disturbances, anxiety/insecurity, emotional lability, cognitive impairment, loss of self, guilt/shame, and contemplation of harming oneself) among postpartum women.39,40 Scores range from 35 to 175, with cutoff scores as follows: 35 to 59, normal adjustment; 60 to 79, significant symptoms of postpartum depression; 80 to 75, positive screen for MDD.41 The scale was validated against the SCID.41 A cutoff score of ≥80 had a sensitivity of 94% and a specificity of 98% for MDD. At ≥60, the sensitivity was 91% and the specificity was 72% for MDD/MnDD. The PDSS forms were purchased for this study.
The SCID is a semistructured interview developed for assessment of 33 DSM-IV axis I diagnoses in adults.25,42 It is the standard method to characterize study samples in terms of psychiatric diagnoses. It was used to establish DSM-IV axis I diagnoses (MDD, MnDD, dysthymia, bipolar disorder, substance use, anxiety, and psychotic disorders); it was administered by trained raters (who were blinded to screening tool scores, including initial CES-D scores), and results were reviewed by a consensus team (including a psychiatrist, psychologists, and trained raters, who were blinded to screening tool scores), to confirm the diagnostic decisions.
To compare characteristics of mothers who did and did not complete the SCID, we used t tests and Wilcoxon rank-sum (Mann-Whitney) tests for continuous demographic variables and χ2 tests for categorical variables. To assess the accuracy of the tools, ROC curves were computed for the whole sample and for each postpartum group for each tool. For each possible threshold based on the sample, we computed the estimates of the corresponding sensitivity, specificity, and positive predictive value. The ROC curve plots the sensitivity of a measure on the y-axis and 1 − sensitivity on the x-axis and measures the overall accuracy of a test. The AUC is the most important summary index of the ROC curve. A ROC curve with an AUC of >0.5 suggests that the test is better than classifying subjects randomly. A ROC curve with an AUC of >0.8 is generally considered to indicate an accurate test. The closer the curve is to the upper left corner, that is, point (0,1), the greater the AUC is and the more accurate the test is.43
For each of the empirical ROC curve estimates (on the basis of the empirical estimates of the sensitivities and specificities at the observed test levels), the empirical AUC and the associated SE were estimated. Because each subject completed each tool, the subjects' results were correlated. Methods developed by Delong et al,44 which address such within-subject correlations, were used to compare the accuracies among the screening tools, both for the entire sample and for each postpartum group.
The AUCs for each tool were compared across the postpartum groups to assess different accuracies among the groups. Optimal cutoff scores for the screening tools were recommended on the basis of the empirical ROC curves. Because sensitivity and specificity estimates change in opposite directions when the cutoff score varies, a good choice should balance sensitivity and specificity while maintaining the ROC curve as close to the upper left corner as possible. We present the results of the optimal cutoff scores computed by using the criteria that minimize the euclidean distance from point (sensitivity,specificity) to point (1,1) in the x-y plane.45
There were no statistically significant differences in the number of children (P = .34) or level of education (0.27) between women who enrolled (N = 422) and those who did not (N = 217). There were differences in age (P = .005), race (P = .004), marital status (P = .004), and insurance types (P = .009) between these groups. Older women, Hispanic women, married women, and women who had private insurance were more likely to refuse.
Of 422 women who consented to participate, 385 (91%) agreed to complete the SCID but 49% (N = 187) did not (Fig 1). There were no statistically significant differences between women who completed the SCID (N = 198) and those who did not (N = 224) with regard to maternal age, education, number of children, or depressive symptom levels (CES-D scores), but Hispanic women, married women, and women with private insurance were less likely to complete the SCID (Table 1).
Approximately equal numbers of women were recruited into each postpartum group (Table 2). There were no statistically significant differences in the proportions with MDD or MDD/MnDD among the groups, with all groups exceeding 50% for MDD or MnDD.
ROC Curves for Screening Tools
Postpartum Year (Infants' Ages of 2–60 Weeks)
When results for the entire sample (N = 198) were evaluated, each tool performed well for MDD and MDD/MnDD, with AUCs of ≥0.8 (Figs 2 and 3). The AUCs for the BDI, EPDS, and PDSS for MDD were 0.84 (95% CI 0.78–0.89), 0.86 (95% CI 0.81–0.91), and 0.83 (95% CI 0.79–0.89), respectively, and those for MDD/MnDD were 0.89 (95% CI 0.84–0.93), 0.87 (95% CI 0.82–0.92), and 0.83 (95% CI 0.78–0.89). There were no statistically significant differences in the AUCs (MDD: χ2 = 1.96, P = .38; MDD/MnDD: χ2 = 5.64, P = .06) for the tools, although there was a trend toward significance for MDD/MnDD.
No statistically significant differences were found between the tools for MDD or MDD/MnDD in any postpartum group (Table 3). To assess potential differences in a tool's accuracy with respect to postpartum period, the AUCs were calculated for each tool for each group (Table 3) and values were compared across the groups, with no statistically significant differences. In the late group, no tool reached an AUC of 0.8 for MDD.
Sensitivity and Specificity of Screening Tools
We assessed the sensitivity and specificity to estimate the optimal cutoff score for each screening instrument, and we compared the results with published cutoff scores. For the BDI-II and EPDS, the optimal cutoff scores for MDD or MDD/MnDD were lower than published guidelines32,33,46 (Table 4). For the PDSS, the optimal cutoff score for MDD/MnDD was within the range for significant symptoms (score of ≥77); however, it was 17 points greater than that recommended for depressive disorder not otherwise specified (or MnDD) (score of ≥60).41 The cutoff score for MDD (score of ≥80) was consistent with published recommendations.41
Optimal cutoff scores for each postpartum group were 0 to 3 points from the optimal cutoff scores for the whole sample (Table 4).
Our study is the first to describe the prevalence of MDD and MnDD, by using a diagnostic interview, and the accuracy of depression screening tools among low-income, black, young mothers attending WCC visits in an urban pediatric clinic. Many studies cited high rates of depressive symptoms assessed with screening tools.2,3,11 Similarly, the CES-D scores indicated that ∼50% of participants and nonparticipants had high levels of depressive symptoms. However, we did not anticipate that such a large proportion (56%) would meet diagnostic criteria for MDD/MnDD in interviews. Although this finding may attributable to selection bias, the equal rates of high CES-D scores among participants and nonparticipants do not support this explanation. It is possible that participants self-identified as needing assistance and therefore were more likely to meet diagnostic criteria for MDD/MnDD than nonparticipants. A second possibility, which is based on the differences in sociodemographic characteristics between participants and nonparticipants, is that the sample might have been the most economically and socially disadvantaged and therefore at greatest risk for depression. Because of potential sample bias, we cannot generalize the high prevalence to the general clinic population, but the findings highlight a group of depressed mothers who need to be identified.
Another finding is that the proportions of depressed women were essentially equivalent for all 4-month infant age ranges in the postpartum year. Because of the cross-sectional study design, we could not identify accurately when incident or recurrent cases occurred. The finding, which is similar to previous findings,15 supports the practice of screening at early and late first-year WCC visits.
Accuracy of Tools
Our findings suggest that the BDI-II, EPDS, and PDSS were equally accurate in identifying depression in low-income, black mothers during the postpartum year. The tools' performance did show some minor variability at different time points, but the differences did not reach a level of statistical significance. These findings suggest that pediatric practitioners can be confident when using the tools at any first-year WCC visit. These findings are similar to those of a study conducted in Pittsburgh with the PDSS Short Form, Patient Health Questionnaire, and EPDS, in which the AUCs for the continuous scales did not show any significant differences.21
Presumably, providers use published cutoff scores to guide their clinical evaluations and referrals. Our findings suggest that, in this population, use of the established cutoff scores for the BDI-II and EPDS may lead clinicians to fail to identify many women with depression. Other studies found similar suboptimal performance by screening tools with traditional cutoff scores.21 Although replication of our findings in other settings with similar populations is required for making final recommendations for changing cutoff scores, pediatric practitioners who use the EPDS or BDI-II should be aware that the use of traditional cutoff scores may not be as accurate as previously thought and scores 2 to 3 points below traditional cutoff scores may indicate a need for further evaluation. Studies from different countries and studies conducted with different ethnic populations suggested a range of optimal cutoff scores.34,–,36,47,–,49 For the PDSS, our findings support the recommended cutoff score (score of 80) for MDD but support a higher cutoff score than traditionally recommended for MDD/MnDD. Use of a higher cutoff score may decrease unnecessary referrals. As with any screening tool, clinical evaluation of specific situations is necessary.
Reasons for the lower optimal cutoff scores for the BDI-II and EPDS are not clear. This population might have higher rates of comorbid medical or psychiatric concerns that might influence the cutoff scores. Anxiety alone cannot represent the explanation, because the EPDS and PDSS have anxiety subscales but the optimal cutoff scores are in opposite directions. Because the BDI-II relies more heavily on somatic symptoms, it might be expected to overestimate the number of depressed women. Our findings are the reverse. Further exploration of the underlying mechanisms for the different optimal cutoff scores is indicated.
With the finding that all 3 tools performed equally well in a low-income, black population of new mothers, providers must consider the advantages and disadvantages of each tool. The EPDS is short, easy to complete, and free to providers, has been used in multiple ethnic and socioeconomic groups and settings, and is available in multiple languages. The PDSS allows clinicians to target interventions or referrals, because it identifies multiple domains, and it is available in Spanish. The disadvantages of the PDSS are its length and cost per use. The advantages of the BDI-II include provider familiarity, availability in Spanish, and use with adolescents and minority populations; disadvantages are its focus on somatic symptoms that may overlap with normal postpartum adaptation, the fact that it must be purchased, and the fact that it is not traditionally used with a dichotomous cutoff score structure. Providers need to take all of this information into consideration when choosing the right screening tool for their clinics.
Strengths and Limitations
First, the sample (that is, urban, low-income, black mothers) is a primary strength of this study, because it represents a large number of women in the United States about whom little is known. Second, the population was recruited from a pediatric clinic, which is important for consideration of the prevalence of depression among mothers presenting to WCC visits. Third, the sampling size and strategy allowed for sufficient sample sizes (as demonstrated by the relatively narrow 95% confidence intervals around the estimated AUCs) to test the tools' accuracy in the postpartum year and within time periods corresponding to WCC visits. The sufficient numbers of depressed and nondepressed women and the use of a diagnostic interview allowed us to address previous studies' limitations.
This study also had limitations. With sampling from 1 urban, academic medical center clinic that serves a low-income, high-risk population, the findings cannot be generalized to more ethnically or socioeconomically diverse populations or other types of pediatric settings. Replication in other sites and types of clinics, as well as among ethnically diverse populations, is warranted. Another limitation is the cross-sectional study design. Validation of the tools in a longitudinal prospective study would help to determine the tools' accuracy at repeated visits. Finally, the large proportion of women lost to follow-up monitoring limited our ability to determine diagnoses or to test the tools among women who might represent a slightly different population. Future studies should attempt to obtain broader representation of the population.
Depression is highly prevalent among low-income, black, postpartum mothers and can be identified accurately through screening with the EPDS, PDSS, or BDI-II. Depending on the population and the screening tool, practitioners may need to alter the cutoff score to identify more effectively individuals who could benefit from referral and treatment.
This study was funded by a grant from the National Institute of Mental Health (grant K23 MH64476). Dr Wisner's work on this study was supported in part by National Institute of Mental Health grants R01 MH071825 and 2 R01 MH057102.
The members of our consensus group were Linda H. Chaudron, MD, Stephanie Gamble, PhD, Nancy L. Talbot, PhD, Holly I. M. Wadkins, and Erin Ward.
We thank the women who participated in this study.
- Accepted October 20, 2009.
- Address correspondence to Linda H. Chaudron, MD, MS, University of Rochester Medical Center, 300 Crittenden Blvd, Rochester, NY 14642. E-mail:
FINANCIAL DISCLOSURE: Dr Katherine Wisner served on an Advisory Board for Eli Lilly Corp and received a donation of active and placebo transdermal estradiol patches for an NIMH funded study from Novartis (Novogyne). Other authors have indicated they have no financial relationships relevant to this article to disclose.
- WCC =
- well-child care •
- AUC =
- area under the curve •
- MDD =
- major depressive disorder •
- MnDD =
- minor depressive disorder •
- DSM-IV =
- Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition •
- EPDS =
- Edinburgh Postnatal Depression Scale •
- BDI-II =
- Beck Depression Inventory II •
- PDSS =
- Postpartum Depression Screening Scale •
- SCID =
- Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition •
- CES-D =
- Center for Epidemiological Studies Depression Scale •
- ROC =
- receiver operating characteristic
- Gaynes BN,
- Gavin N,
- Meltzer-Brody S,
- et al
- Stein A,
- Gath DH,
- Bucher J,
- Bond A,
- Day A,
- Cooper PJ
- Silverstein M,
- Augustyn M,
- Cabral H,
- Zuckerman B
- Olson AL,
- Kemper KJ,
- Kelleher KJ,
- Hammond CS,
- Zuckerman BS,
- Dietrich AJ
- Olson AL,
- Dietrich AJ,
- Prazar G,
- Hurley J
- Chaudron LH,
- Szilagyi PG,
- Kitzman HJ,
- Wadkins HI,
- Conwell Y
- Chaudron LH,
- Szilagyi PG,
- Campbell AT,
- Mounts KO,
- McInerny TK
- Kabir K,
- Sheeder J,
- Kelly LS
- Dubowitz H,
- Feigelman S,
- Lane W,
- et al
- Radloff LS
- Chaudron LH,
- Giannandrea SAM,
- Wadkins H
- Beck A,
- Steer R,
- Brown LP
- Cox JL,
- Holden JM,
- Sagovsky R
- Harris B,
- Huckle P,
- Thomas R,
- Johns S,
- Fung H
- Murray L,
- Carothers AD
- Beck C,
- Gable RK
- Pepe M
- Perkins NJ,
- Schisterman EF
- Copyright © 2010 by the American Academy of Pediatrics