Reliability and Validity of a Two-Question Alcohol Screen in the Pediatric Emergency Department
- Anthony Spirito, PhDa,
- Julie R. Bromberg, MPHb,c,
- T. Charles Casper, PhDd,
- Thomas H. Chun, MD, MPHb,c,
- Michael J. Mello, MD, MPHb,c,
- J. Michael Dean, MD, MBAd,
- James G. Linakis, PhD, MDb,c,
- for the Pediatric Emergency Care Applied Research Network
- aDepartments of Psychiatry & Human Behavior and
- bEmergency Medicine, The Warren Alpert Medical School of Brown University, Providence, Rhode Island;
- cDepartment of Emergency Medicine, Rhode Island Hospital, Providence, Rhode Island; and
- dDepartment of Pediatrics & PECARN Data Coordinating Center, University of Utah, Salt Lake City, Utah
Dr Spirito contributed to the study design, formulated the manuscript concept, and drafted the initial manuscript; Ms Bromberg, Dr Chun, and Dr Mello contributed to the design of the study, and critically reviewed and edited the manuscript; Dr Casper contributed to the design of the study, supervised the analyses, and reviewed and revised the manuscript; Dr Dean contributed to the design of the study, and reviewed and edited the manuscript; Dr Linakis contributed to the design of the study, formulated the manuscript concept, and critically reviewed and edited the manuscript; and all authors approved the final manuscript as submitted.
BACKGROUND AND OBJECTIVE: A multisite study was conducted to determine the psychometric properties of the National Institute of Alcohol Abuse and Alcoholism (NIAAA) 2-question alcohol screen within pediatric emergency departments (PEDs).
METHODS: Participants (N = 4838) included 12- to 17-year-old subjects treated in 1 of the 16 participating PEDs across the United States. A criterion assessment battery (including the NIAAA 2-question alcohol screen and other measures of alcohol, drug use, and risk behaviors) was self-administered on a tablet computer. A subsample (n = 186) was re-administered the NIAAA 2-question screen 1 week later to assess test-retest reliability.
RESULTS: Moderate to good test-retest reliability was demonstrated. A classification of moderate risk or higher on the screen had the best combined sensitivity and specificity for determining a diagnosis of alcohol use disorder (AUD) for all students. Any past year drinking among middle school students increased the odds of a diagnosis of an AUD according to Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition criteria, whereas the optimal cutoff for high school ages was ≥3 drinking days in the past year. The optimal cutoff for drinking days determining a positive Alcohol Use Disorders Identification Test score among middle school subjects was ≥1 drinking day, whereas the optimal cutoff for high school subjects was ≥2 drinking days.
CONCLUSIONS: The NIAAA 2-question screen is a brief, valid approach for alcohol screening in PEDs. A positive screen suggests that referral for further evaluation is indicated to determine if an adolescent has an AUD.
- AUD —
- alcohol use disorder
- AUDIT —
- Alcohol Use Disorders Identification Test
- CI —
- confidence interval
- DISC —
- Diagnostic Interview Schedule for Children
- DSM-5 —
- Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition
- ICC —
- intraclass correlation coefficient
- PECARN —
- Pediatric Emergency Care Applied Research Network
- PED —
- pediatric emergency department
- NIAAA —
- National Institute of Alcohol Abuse and Alcoholism
- ROC —
- receiver-operating characteristic curve
- RR —
- relative risk
What’s Known on This Subject:
Early identification of youth alcohol problems is strongly recommended, yet there is no consensus regarding the best alcohol screening tool for adolescents. Preliminary evidence identified the National Institute of Alcohol Abuse and Alcoholism 2-question screen as a potential tool for pediatric emergency department clinicians.
What This Study Adds:
This study determined the psychometric properties of the National Institute of Alcohol Abuse and Alcoholism 2-question alcohol screen in a large, diverse pediatric emergency department sample. The screen was found to have adequate reliability and concurrent/convergent validity.
The earlier that youth initiate alcohol use, the more likely they are to use other drugs and engage in other problem behaviors, such as sex without contraception, delinquency, and school dropout.1,2 For these reasons, medical3,4 and federal5–7 organizations recommend alcohol screening and intervention (when appropriate) for adolescents within pediatric emergency departments (PEDs) and other health care settings. Previous studies in primary care8,9 and in the emergency department10–13 have found that although a large majority of physicians have favorable attitudes toward alcohol disorder screening, such services are underutilized. A substantial portion of adolescents use PEDs as their only source of medical care.14,15 These individuals are more likely to report substance use and mental health problems, highlighting a need for PED-based alcohol screening.16,17 Although the PED is an ideal venue for alcohol screening, screening instruments must involve minimal training and implementation time to be feasible.
In 2011, the National Institute of Alcohol Abuse and Alcoholism (NIAAA) developed an alcohol screening tool for youth that asks about the patient’s drinking frequency and friends’ drinking to determine alcohol risk. This tool’s items and risk levels have been operationally defined by the NIAAA6 and are summarized in Table 1. Due in part to its brevity, this screen is ideal for PEDs and pediatric primary settings. Initial analyses of the NIAAA 2-question screen indicated that it may be an effective predictor of current and future alcohol problems,18,19 although, to date, the screen has not been rigorously tested.19–21 The objective of the present study was to determine the test-retest reliability and concurrent and convergent validity of the NIAAA 2-question screen when delivered in the PED setting.
Youth treated in 1 of the participating PEDs in the Pediatric Emergency Care Applied Research Network (PECARN) were enrolled in the study. Established in 2001, PECARN was the first pediatric emergency care research network and currently consists of 18 PEDs located across the country and a data coordinating center. Sixteen of the sites participated in this study (as noted in the Acknowledgments).
All sites received institutional review board approval before enrolling participants. Due to the potential legal implications of adolescent high-risk behavior (eg, illicit alcohol or drug use), a Certificate of Confidentiality was obtained from the US Department of Health and Human Services. Inclusion criteria were as follows: (1) 12 to 17 years of age; (2) seen in the PED for a non–life-threatening injury, illness, or mental health condition; and (3) in the opinion of the clinical staff, were medically, cognitively, and behaviorally stable. Youth were excluded if they were: (1) in severe acute emotional distress (eg, suicidal, suspected by the clinical staff of being a victim of child abuse); (2) in the opinion of the clinical staff, cognitively impaired and unable to provide informed assent; (3) unaccompanied by an adult qualified to give written permission for the youth’s participation in research; (4) unable to read and speak English or Spanish; (5) parents unable to read and speak English or Spanish; (6) without a telephone or an address of residence; or (7) previously enrolled in this study. Adolescents who met inclusion/exclusion criteria and their parent(s) were approached by study staff and asked to provide written assent and written parental permission, respectively.
After enrollment, a criterion assessment battery, including the NIAAA 2-question screen and other measures of alcohol, drug use, and risk behavior was self-administered on a tablet computer. In accordance with the NIAAA guidelines,6 the screen was used to group participants into 4 categories: nondrinkers and those with low, moderate, and high risk. These risk classifications are determined based on the number of drinking days in the past year (Table 1). Also of note, when making decisions about referral for further evaluation, clinicians were asked to consider whether patients have friends who drink (middle school, ages 11–14 years) or binge drink (high school, ages 14–18 years). Both the middle school (which asks about peer alcohol use first) and the high school (which asks first about the individual’s own use) versions were administered as appropriate. The definition of binge drinking varies by age and sex; thus, for the purposes of this analysis, we assumed that the participant’s friends were the same sex and in the same age category as the participant.
A random sample of enrolled participants was contacted by telephone and e-mail 7 to 14 days after the PED visit to repeat the NIAAA 2-question screen to measure test-retest reliability.
Concurrent and Convergent Validity
Concurrent validity, the degree to which the results of a test are comparable to those of an established gold standard measure of the same construct, was assessed with the Alcohol- and Substance-Use Disorder module of the Diagnostic Interview Schedule for Children (DISC).22 The DISC, the most widely used and studied mental health interview, has been tested in both clinical and community populations23 ages 9 to 17 years and has been used in a number of emergency department screening studies.24,25 The DISC has been shown to have high sensitivity (0.73–1.0 for psychiatric disorders, including substance use disorder).22 The DISC was used as the criterion measure for Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5), diagnoses based on participant responses. A question about craving substances was added to the DISC so that the DSM-5 diagnosis for an alcohol use disorder (AUD) could be derived.
Convergent validity, the degree to which 2 tests designed to assess the same construct are related, was measured by using the Alcohol Use Disorders Identification Test (AUDIT),26 the most widely used screen for adolescent alcohol misuse. The AUDIT is a 10-question screen focusing on the quantity and frequency of alcohol use, alcohol dependence, and alcohol-related consequences.27 It has adequate internal consistency (α = 0.85 [consumption] and 0.61 [consequences]).24
Because there are 2 different versions of the NIAAA 2-question screen, analyses were performed for the sample as a whole, as well as for middle school and high school ages separately (Table 1). Test-retest reliability was calculated by using an intraclass correlation coefficient (ICC)28 for the overall NIAAA 2-question screen score and for the individual question regarding number of days drinking in the past year. A Fleiss-Cohen weighted κ was also calculated based on categorization of number of drinking days in the past year as none versus any. To assess the relationship between responses to the 2 questions, for middle school, a summary of the distribution of question 2 responses (yes/no, participant drank in the past year) was examined against responses to question 1 (yes/no, any friends who drank in past year) by using a κ coefficient. These responses were dichotomized due to a high number of zeros. Because there was more variability in the numeric responses for high school participants, a Pearson’s correlation coefficient was calculated for this group.
Concurrent validity was examined by using a logistic regression model comparing the odds of a DISC diagnosis (yes/no) against risk categories of the NIAAA 2-question screen. Differences between levels representing a single change in categorization (nondrinker versus low risk, low versus moderate risk, and moderate versus high risk) were tested with the Wald test. The Cochran-Armitage test was used to examine the trend to receive a DISC diagnosis across all of the screen categories. A receiver-operating characteristic curve (ROC) analysis was used to investigate possible cutpoints on the NIAAA 2-question screen score for detecting a DISC diagnosis. The optimal cutpoint was defined as the point at which the sum of sensitivity and specificity was maximized. Test characteristics were calculated at each potential cutpoint, and the area under the curve was used to provide an assessment of the overall accuracy of the screen in predicting DISC diagnosis.
Convergent validity was examined by comparing the AUDIT scores between risk categories of the NIAAA 2-question screen and testing the differences between levels representing single changes in categorization (nondrinker versus low risk, low versus moderate risk, and moderate versus high risk) with the Wilcoxon rank-sum test, followed by a test of independence. An analysis of variance was used to examine whether AUDIT scores differed across all of the screen categories. AUDIT scores were also compared between participants classified as drinkers and nondrinkers on the NIAAA 2-question screen for each age group by using the Wilcoxon rank-sum test. An ROC analysis was used to investigate possible cutpoints on the NIAAA screen score for detecting an AUDIT score ≥4, which has been used as the clinical cutoff in previous studies with adolescents.24,26,29
Sensitivity was used as the basis for our sample size requirements. We assumed a target sensitivity of 90%. For the 95% confidence interval (CI) around sensitivity to be within ±2.5%, ∼5000 participants would be needed. We determined that ∼200 participants with 1-week follow-up would provide a stable estimate of test-retest reliability.
Multiple imputation was used to handle nonresponse in analyses involving the AUDIT and DISC surveys. We generated 5 imputations by fully conditional specification,30 using backward selection to choose sufficiently predictive models for each variable given the others. The models were constrained to include at least 1 item from the NIAAA 2-question screen to preserve any association between NIAAA outcomes and the other variables. In analyzing the imputed data, Kruskal-Wallis tests were replaced by global tests on the coefficients in a proportional ordinal logistic regression model and χ2 tests were replaced by Wald tests on a coefficient in a logistic regression model. All analyses were performed by using SAS version 9.4 (SAS Institute, Inc, Cary, NC).
The analyses include results from the 4838 participants who completed baseline activities in the PED. Approximately the same number of participants was recruited from each of the sites. Participants were equally distributed across sex and age. Forty-six percent of participants identified as white and 26% identified as black; 26% identified as Hispanic. Overall, of the 4838 participants who completed baseline activities, 4.1% of the AUDIT total scores were missing and therefore imputed. Missing data were due to a participant responding “I prefer not to answer,” an item that precluded calculating a total score. In 6.5% of the cases, an AUDIT-positive participant was not diagnosed with an AUD on the DISC, and in 1.9% of cases, a participant who received a diagnosis on the DISC did not reach the AUDIT cutoff score of 4.
A total of 186 (68%) of the 274 participants assigned to the test-retest follow-up group completed their 1-week follow-up assessment (average completion date was 9.8 days from enrollment). There were no differences in age, sex, or any of the baseline alcohol use variables between those who completed or did not complete the retest. Of those who completed the retest, 44% completed it online and 56% completed it over the telephone.
On retesting, 14 youth reported a higher (7.5%) and 18 reported a lower (9.7%) NIAAA risk category. The ICC for the 4 NIAAA categories as a score was 0.67 for the entire sample (95% CI, 0.58–0.74), 0.67 for the middle school sample (95% CI, 0.51–0.78), and 0.65 for the high school sample (95% CI, 0.54–0.75) (Table 2). Weighted κ coefficients were as follows: entire sample, κ = 0.63 (95% CI, 0.51–0.75); middle school sample, κ = 0.57 (95% CI, 0.27–0.87); and high school sample, κ = 0.61 (95% CI, 0.48–0.75). These results suggest moderate agreement.31
When responses were dichotomized according to whether participants reported no drinking at all or any drinking days, κ coefficients were as follows: entire sample, κ = 0.65 (95% CI, 0.52–0.77); middle school, κ = 0.58 (95% CI, 0.25–0.91); and high school, κ = 0.63 (95% CI, 0.48–0.78).
The ICC for the number of days the participant reported drinking in the past year for the total sample was 0.32 (95% CI, 0.18–0.44), indicating a fair level of agreement. The ICC was higher for the middle school group (0.50 [95% CI, 0.30–0.66]; n = 66) than for the high school group (0.30 [95% CI, 0.13–0.46]; n = 120).
Relationship Between Self-report and Friends’ Drinking Questions
For the whole sample, the age group–adjusted risk of self-reported drinking was higher among those whose friends drank than among those whose friends did not drink (relative risk [RR], 3.4 [95% CI, 3.1–3.7]). For the middle school group, the risk of drinking was 8-fold higher among those whose friends drank (RR, 8.1 [95% CI, 6.1–11.1]). For the high school group, the risk of drinking was 3-fold higher among those whose friends drank (RR, 2.9 [95% CI, 2.7–3.2]). For high school students, the Pearson correlation coefficient between the 2 questions on the screen was r = 0.29 (95% CI, 0.26–0.33; P < .01).
Table 3 summarizes whether a participant received a DSM-5 diagnosis of an AUD on the DISC according to categories of the NIAAA 2-question screen. Each change in risk category on the NIAAA 2-question screen leads to a significant difference in DISC diagnosis of an AUD. Table 4 indicates that a classification of moderate risk or higher on the NIAAA 2-question screen had the best combined sensitivity (89% [95% CI, 69–100] for middle school and 88% [(95% CI, 81–96] for high school) and specificity (91% [95% CI, 90–92] for middle school and 81% [95% CI, 80–82] for high school) for determining an AUD on the DISC.
Figure 1 indicates that for middle school students, a DSM-5 diagnosis of AUD was predicted best by any self-reported drinking in the past year (which is identical to a classification of moderate risk or higher) on the NIAAA 2-question screen. The table accompanying Fig 1 indicates that the optimal cutoff for high school ages, however, was ≥3 drinking days in the past year for predicting DISC alcohol use diagnosis according to the DSM-5, with a sensitivity of 93% (95% CI, 87–99) and a specificity of 81% (95% CI, 79–82).
Table 5 presents the AUDIT scores according to categories of the NIAAA 2-question screen. With the exception of the low-risk category compared with the moderate-risk category, each change in the screen categories led to a significant difference in AUDIT scores. The overall test comparing NIAAA risk categories and the analysis of variance post hoc test for trends were statistically significant. The Wilcoxon rank-sum test also showed significant differences in the distribution of AUDIT scores between drinkers and nondrinkers. Of the participants classified as high risk on the screen, the majority were also categorized as high risk on the AUDIT (ie, 64% had an AUDIT score ≥4).
For the clinical cutoff of 4 on the AUDIT, a cutoff of high risk on the NIAAA 2-question screen provided the highest combined sensitivity (78% [95% CI, 63–94]) and specificity (92% [95% CI, 90–93]) for middle school students. A cutoff of lower risk or greater provided the highest sensitivity (95% [95% CI, 93–97]) and specificity (74% [95% CI, 72–75]) for high school students (Table 6). ROC analyses based on the number of self-reported drinking days are shown in Fig 2. The table accompanying Fig 2 indicates that the AUDIT clinical cutoff for middle school students is best predicted by using a cutoff of ≥1 drinking day on the NIAAA 2-question screen, with 78% sensitivity (95% CI, 63–94) and 92% specificity (95% CI, 90–93). For high school ages, however, ≥2 drinking days best predicted the AUDIT clinical cutoff, with 90% sensitivity (95% CI, 87–93) and 82% specificity (95% CI, 80–83).
This article presents psychometric data on a brief measure recommended by the NIAAA to screen for youth alcohol risk. Data were collected from a large, ethnically, racially, and geographically diverse sample from 16 PEDs within the PECARN network. DSM-5 diagnoses were found for 2% of the sample, which is consistent with data from the National Survey on Drug Use and Health.32
Moderate to good test-retest reliability was found.33 Test-retest reliability was comparable across the middle and high school samples using both ICC and κ approaches. When responses were dichotomized into drinks versus does not drink, agreement was good.34 Approximately 17% of the sample changed answers on retesting, with comparable rates reporting higher or lower risk categories. This inconsistency might have been related to mode of assessment; all baseline data were collected online while more than one-half of the retest questions were completed over the telephone with an interviewer. Overall, reliability of the NIAAA 2-question screen was adequate or better given that reliability statistics are affected by the number of items in a scale.33 The 1 exception was the “drinking days” item for which reliability was fair for middle school students but poor for high school students. The lower coefficients on the continuous variable of drinking days, compared with the categorical and dichotomous classifications, suggest recall problems when a specific number of drinks is asked of a respondent.
Concurrent validity was examined by using ROC analyses and revealed that categorizing youth as low versus moderate or higher risk on the NIAAA 2-question screen had the best combined sensitivity and specificity for determining a DSM-5 diagnosis of an AUD of any severity (mild, moderate, or severe). This outcome was true regardless of whether the youth was in middle or high school. Similarly, for middle school students, a DSM-5 diagnosis of an AUD was associated with any drinking in the past year as self-reported on the screen. However, the optimal cutoff for high school ages was ≥3 drinking days for predicting an AUD. Our finding in high school students that 3 days is an optimal cutoff is consistent with a recent study by Clark et al35 of rural youth attending an outpatient primary care appointment. This study found that 3 drinks in the past year was also the best predictor for middle school students, but we found any drinking to be the best predictor. The difference between studies with middle school students may have been due to the greater percentage of rural middle school students in the sample by Clark et al. We chose to explore test characteristics using a cutpoint that maximizes the sum of sensitivity and specificity. As with any screen, a different cutpoint may be used, depending on the tradeoff between sensitivity and false-positive findings.
With respect to convergent validity, a simple classification using 1 item (drinker versus nondrinker) on the NIAAA 2-question screen had the best combined sensitivity and specificity with respect to a clinical cutoff of 4 on the AUDIT for middle school students. The optimal cutoff for high school ages, however, was ≥2 drinking days on the screen to predict a clinical cutoff score of 4 on the AUDIT. These cutoff scores, for both AUDs and the AUDIT clinical cutoff, err on the side of overclassification.
There are some limitations to this study that should be considered. First, although the sample was large and diverse, it is not representative of the general population. The study was limited to adolescents being treated in a PED; thus, generalization to other populations may be limited. Second, the order of administration of the criterion measures was varied, but the NIAAA 2-question screen was always administered first, which may have had an effect on the outcomes. Third, correlations of the criterion instruments with the NIAAA 2-question screen might have been affected somewhat because both the AUDIT and DISC ask about frequency of alcohol use but use different response formats than the free choice item on the NIAAA 2-question screen. In addition, the correlations are also affected by the reliability and validity of each criterion measure. Fourth, the measures were all self-administered, and participants were informed that responses would not be shared with clinical staff; therefore, we cannot comment on how the screen would perform when the questions are asked by a health care provider. Fifth, test-retest reliability may have been affected by the fact that only about two-thirds of the designated sample completed the retest, although there were no differences between completers and noncompleters. In addition, most of the retest sample were nondrinkers.
The NIAAA 2-question screen, which categorizes youth risk level according to frequency of alcohol use, is a valid, rapid, and simple approach for PED-based alcohol screening that is briefer than other comparable screens. Self-administration may be a useful way to screen in a busy clinical practice and has the potential advantage of eliciting more accurate responses from youth.36 However, the NIAAA screen maximizes sensitivity in identifying youth who may be at risk for alcohol use problems. Therefore, either more conservative cutoff scores could be used or additional questioning will be necessary to determine if an adolescent should be referred for further evaluation. Future research should examine the predictive validity of the NIAAA 2-question screen in detecting AUDs at later time periods as well as examining if cutoff scores differ by specific age groups.
The authors acknowledge PECARN and the participating PECARN sites, including: Baylor College of Medicine/Texas Children’s Hospital (R. Shenoi); Boston Children's Hospital (M. Monuteaux); Children’s Hospital of Colorado (L. Bajaj); The Children's Hospital of Philadelphia (J. Fein); Children's National Medical Center (K. Brown); Cincinnati Children's Hospital Medical Center (J. Grupp-Phelan); Columbia University/Children’s Hospital of New York–Presbyterian (L. Chernick); Hasbro Children’s Hospital (A. Spirito); Lurie Children's Hospital of Chicago (E. Powell); Medical College of Wisconsin (M. Levas); Nationwide Children's Hospital (D. Cohen); Nemours/Alfred I. duPont Children’s Hospital (C. Mull); St Louis Children’s Hospital/Washington University (F. Ahmad); University of California, Davis (T. Horeczko and C. Vance); University of Michigan (A. Rogers); and University of Pittsburgh (B. McAninch and B. Suffoletto). Our efforts would not have been possible without the commitment of the investigators and research coordinators from these sites.
The authors also thank the PECARN Steering Committee members: R. Stanley (chair), B. Bonsu, C. Macias, D. Brousseau, D. Jaffe, D. Nelson, E. Alpern, E. Powell, J. Chamberlain, J. Bennett, J.M. Dean, L. Bajaj, L. Nigrovic, N. Kuppermann, P. Dayan, P. Mahajan, R. Ruddy, and R. Hickey. A special thanks to the staff at the Data Coordinating Center, including H. Gramse, S. Zuspan, J. Wang, J. M. Dean, M. Ringwood, and T. Simmons, for their dedication and assistance throughout the study. Lastly, the authors thank the subjects and their parents for participating in this study.
- Accepted September 22, 2016.
- Address correspondence to James G. Linakis, PhD, MD, Rhode Island Hospital, Department of Emergency Medicine, 55 Claverick St, 2nd Floor, Providence, RI 02903. E-mail:
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: All phases of this study were supported in part by the National Institute of Alcohol Abuse and Alcoholism (1R01AA021900 to Drs Spirito and Linakis). This project is supported in part by the Health Resources and Services Administration, Maternal and Child Health Bureau, Emergency Medical Services for Children Network Development Demonstration Program, under cooperative agreements U03MC00008 and U03MC00001, U03MC00003, U03MC00006, U03MC00007, U03MC22684, and U03MC22685. This information or content and conclusions are those of the authors and should not be construed as the official position or policy of, nor should any endorsements be inferred by, the Health Resources and Services Administration, the US Department of Health and Human Services, or the US Government. Funded by the National Institutes of Health (NIH).
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
- Copyright © 2016 by the American Academy of Pediatrics