OBJECTIVES: To describe the development of a prognostic tool to identify adolescents at risk for transitioning from never to ever smoking in the next year.
METHODS: Data were drawn from the Nicotine Dependence in Teens study, a longitudinal investigation of adolescents (1999 to present). A total of 1294 students initially age 12 to 13 years were recruited from seventh-grade classes in 10 high schools in Montreal. Self-report questionnaire data were collected every 3 months during the 10-month school year over 5 years (1999–2005) until participants completed high school (n = 20 cycles). Prognostic variables for inclusion in the multivariable analyses were selected from 58 candidate predictors describing sociodemographic characteristics, smoking habits of family and friends, lifestyle factors, personality traits, and mental health. Cigarette smoking initiation was defined as taking even 1 puff on a cigarette for the first time, as measured in a 3-month recall of cigarette use completed in each cycle.
RESULTS: The cumulative incidence of cigarette smoking initiation was 16.3%. Data were partitioned into a training set for model-building and a testing set to evaluate the performance of the model. The final model included 12 variables (age, 4 worry or stress-related items, 1 depression-related item, 2 self-esteem items, and 4 alcohol- or tobacco-related variables). The model yielded a c-statistic of 0.77 and had good calibration.
CONCLUSIONS: This short prognostic tool, which can be incorporated into busy clinical practice, was used to accurately identify adolescents at risk for cigarette smoking initiation.
- Bolasso —
- bootstrap-enhanced least absolute shrinkage operator
- Lasso —
- least absolute shrinkage and selection operator
- MI —
- multiple imputation
- ND —
- nicotine dependence
- NDIT —
- Nicotine Dependence in Teens
- SSC —
- susceptibility to smoking cigarettes index
What’s Known on This Subject:
Pediatricians and family practitioners are important sources of smoking preventive counseling. However, the lack of a prognostic tool to assist clinicians rapidly identify youth at risk of transitioning from never to ever smoking is a major barrier to counseling.
What This Study Adds:
Using data from a longitudinal investigation of adolescents, we developed a 12-item prognostic tool for use in clinical practice to identify adolescents at risk for initiating cigarette smoking. This tool has good predictive ability.
Cigarette smoking typically begins during adolescence, and the younger the age of initiation, the greater the risk of daily smoking,1,2 heavy cigarette consumption,3,4 nicotine dependence (ND), and difficulty quitting.5 The prevalence of “tried smoking” has declined markedly in North American youth (from 20% of US middle school students in 20136 to 7% in 20167 and from 45% of sixth- through ninth-grade Canadian students in 19948 to 8% in 2014–20159). However, 25% to 30% of never-smokers lack firm commitment to never smoke and are classified as “susceptible to smoking.”8,10 These individuals represent a key target group for prevention11 because the transition from never to ever smoking can lead to rapidly increasing cigarette use.12
Pediatricians and family practitioners are important sources of preventive counseling.13 It was recently recommended that education and brief counseling aimed at preventing school-aged youth from trying a first cigarette, be integrated into counseling.14,15 However, because of competing priorities,16 time and/or resource constraints, and low provider self-efficacy,17,18 routine counseling remains infrequent.16,19 In addition, the lack of a prognostic tool to assist clinicians rapidly identify at-risk youth is a major barrier to delivering counseling.20 The susceptibility to smoking cigarettes (SSC) index is widely used in research to identify individuals at risk for becoming a smoker, but it has not been tested in clinical settings and is focused solely on smoking intentions, disregarding other factors known to affect initiation.11,21,22 With an accurate prognostic tool, at-risk youth could be selectively targeted for counseling, rendering counseling more efficient.
We describe the development of a prognostic tool for use by clinicians, which identifies adolescents at risk for transitioning from never to ever smoking in the next year. It was developed on the basis of a large literature identifying a wide range of factors that predict initiation among never smokers.22 It incorporates 12 questions, most with yes or no responses.
This current study is an extension of the Nicotine Dependence in Teens (NDIT) study, a longitudinal investigation of 1294 seventh-grade students ages 12 to 13 years recruited in 10 high schools in Montreal, Canada.23 The NDIT study aimed to describe the natural course of cigarette smoking and ND. A total of 55.4% of eligible students participated (some teachers refused to collect consent forms because of a labor dispute). Parents and/or guardians provided informed consent, and participants assented. Questionnaire data (1999–2005) were collected every 3 months during the 10-month school year over 5 years, for a total of 20 cycles. Because this current study was focused on initiation, prevalent smokers at inception were excluded. Ethics approval was obtained from ethics committees at the Montreal Department of Public Health, McGill University, and the University of Montreal.
Cigarette smoking was assessed in each cycle in a 3-month recall,24 which measured the number of days in each of the 3 preceding months in which participants had smoked and number of cigarettes smoked per day on average during that month. Test–retest reliability for these 2 items was good.25 Initiation was considered to have occurred during the cycle in which participants smoked for the first time.
A total of 58 prognostic variables were selected on the basis of a review of cigarette smoking predictors in youth,22 as well as the feasibility of collecting data from youth in clinical settings. Selected variables pertained to sociodemographics, smoking habits of family and friends, and lifestyle factors (Supplemental Table 3, Supplemental Information). In addition, personality traits and mental health were measured by using validated scales (Supplemental Table 3), the items of which we considered as variables.
The prognostic tool was developed in 6 steps (Supplemental Fig 4).
Create Data Sets
To enable predicting the 1-year risk of initiation, the data set was divided into 4 consecutive 5-cycle waves with baselines in cycles 1, 5, 9, and 13. Predictor variable values were drawn from the baseline cycle, and the initiation indicator was based on the subsequent 4 cycles corresponding to a 1-year follow-up. Initiators in a given wave were excluded from all subsequent waves. Never-smokers could be included in up to 4 waves. The four 1-year waves were pooled. Continuous variables were standardized to ensure a common unitless scale.26 Prognostic models produce better prediction in the data sets in which they are built than in independent data sets, a phenomenon known as overoptimism.27 Thus, the analytical sample was randomly divided into a training sample (80% of observations) in which the models were developed, and a test sample (20% of observations) was used to estimate model performance. Both samples had the same proportion of initiators.
We used a nonparametric, computationally efficient multiple imputation (MI) method on the basis of random forests, to impute missing values of the predictors.28
Select Prognostic Variables
Prognostic variables were selected by using the bootstrap-enhanced least absolute shrinkage operator (Bolasso)29 algorithm in the training data set. Bolasso improves on variable selection in least absolute shrinkage and selection operator (Lasso) by combining it with bootstrapping.29 Lasso is a penalized regression in which a penalty parameter λ is selected to control the number of variables that enter a model, with large values of λ leading to sparse models.30 In effect, coefficients of less influential variables are penalized to exactly 0, which is how variable selection is performed.30,31 When predictors are strongly correlated, Lasso is not consistent, such that a given penalty λ can lead to different sets of variables.29 Bolasso relies on bootstrapping to stabilize the variable selection process. For a given λ, variable selection is performed by choosing the variables selected by Lasso in the vast majority of bootstrapped copies.29 In our implementation of Bolasso, we considered 150 bootstraps and selected variables that appeared in 95% of bootstraps. We incorporated MI by estimating Lasso in each MI data set and by selecting the set of variables for which the averaged coefficient over the 15 imputation data sets was ˃0.001 as an absolute value. We considered model sizes ranging from 1 to 20 variables. Supplemental Figure 4 is used to summarize the variable selection process.
We used 10-fold cross-validation to estimate coefficients and validate each model. For each model of size 1 to 20, we divided the training data set into 10 partitions of equal size and initiation prevalence, imposing the same partition on each imputation data set. We then estimated the model on nine-tenths of the data and averaged the regression coefficients over the 15 imputation data sets. We repeated the procedure 10 times, each time excluding a different tenth of data. Supplemental Figure 5 is used to summarize the estimation and validation process.
For the prognostic tool, we chose from among all models with c-statistics ˃0.70, the model with 5 to 20 variables that minimized optimism and had good calibration performances in the validation data set (Supplemental Table 4).
The c-statistic measured model performance in discriminating participants who did and did not initiate smoking. We used calibration plots to assess the level of agreement between predicted initiation probabilities and the observed outcome.32 Using calibration curves in the training data set allows for internal calibration (ie, checking whether the model was missing important predictors). Using calibration curves in the test data set allowed for external validation, which assessed whether the model over- or underpredicted initiation for a given range of observed probabilities resulting in poor performance in external data.33 Calibration plots are used to depict the smoothed estimated relationship between observed outcomes and the predicted probability of the outcome.32 Perfect calibration is indicated by a diagonal line with unit slope, and large discrepancies from the diagonal indicate segments in the range of the predictions in which the model under- or overpredicts the outcome.32
We defined thresholds to identify adolescents at risk of initiation from the probabilities estimated in the model using a utility-based approach that emphasizes sensitivity over specificity by constraining the sensitivity to be ≥0.80.34 This is warranted when the intervention (ie, smoking prevention counseling) is not invasive so that false-positives are less problematic than false-negatives (ie, counseling low-risk adolescents is a less important problem than not counseling adolescents at risk).35 To facilitate the model interpretation, we describe a scoring system to evaluate the initiation risk for 8 scenarios.
Of 1294 participants, 461 participants who reported cigarette smoking at inception or joined the NDIT study after inception were excluded. Wave 1 included 833 never smokers; 22.8% initiated smoking. Waves 2 to 4 comprised 584, 457, and 388 never smokers of whom 16.4%, 10.3%, and 16.7% initiated. Together, the 4 waves included 2266 observations contributed by 842 adolescents. The training data set comprised 1813 observations, and the test data set included 453 observations. Overall, 370 adolescents initiated smoking in both the training and test data sets, representing 16.3% of all observations. Missing value patterns are described in Supplemental Table 5. Table 1 is used to present baseline statistics.
With Supplemental Table 4, we report performance measures for models with 1 to 20 variables in the training data set. The final model was selected from among models with 5 to 20 variables because its c-statistic was the highest, and its measure of optimism was similar to that of models of similar size.
Twelve variables were associated with initiation, including age, 4 stress items (ie, worried or stressed about loneliness, weight, a health problem, or relationship with siblings), 1 depression-related item (ie, feeling hopeless about the future), 2 self-esteem items (ie, have something valuable to offer; have a positive attitude toward oneself), and 4 alcohol- or tobacco-related variables (ie, consumes alcohol, feels the need for a cigarette, finds it hard not to smoke when others are smoking, friend[s] smoke). With Fig 1, we report the coefficients for the 12 variables; the size of the bar corresponds to each coefficient proportional to its relative effect on the estimated probability of initiation. The risk decreased with age. Alcohol consumption and having friends who smoke increased the risk. Adolescents with positive self-esteem were at reduced risk. The estimated coefficients with 95% confidence intervals are shown in Supplemental Table 6.
After cross-validation in the training data set, the model had a mean c-statistic of 0.72 (SD = 0.07), a mean sensitivity of 0.80 (SD = 0.08), and a low optimism value (0.011). In the test data set, the c-statistic and sensitivity were 0.77 and 0.80, respectively, which is consistent with good predictive ability. In the training data set, the cutoff indicating whether an adolescent was at risk of initiation was 0.11. In the test data set, the cutoff yielded a sensitivity and specificity of 0.80 and 0.55. The calibration curves being associated with the validation of the training data set (Fig 2A) suggest excellent calibration of the model with slight overprediction of probabilities beyond 0.40, which is well above the cutoff indicating whether an adolescent is at risk for initiation and therefore has no practical impact. The behavior of the locally weighted scatter-plot smoother curve in the test data set (Fig 2B) appears bumpy suggesting slight underprediction of probabilities ˃0.20.
The estimated regression coefficients (Fig 1) were used to construct the prognostic tool (Fig 3).38 An online version of the tool can be used to automatically compute the 1-year risk of initiation (http://www.mapageweb.umontreal.ca/sylvesma/logiciels-en.html). Table 2 describes the application of the scoring system to 8 scenarios. Scenario 1 suggests that being drawn to smoking is not enough to place an adolescent at risk if the adolescent has high self-esteem, does not consume alcohol, and does not have friends who smoke. However, the combination of being drawn to smoking and having low self-esteem (scenarios 3–5) or consuming alcohol and having friends who smoke (scenario 2) does place the adolescent at risk. Scenario 8 suggests that adolescents who are not drawn to cigarettes are at high risk if they have low self-esteem, consume alcohol, and have friends who smoke.
Given the burden of smoking, counseling to prevent initiation should be a top priority in pediatric practice. Growing evidence on how quickly ND symptoms can manifest after the first puff12,39,40 supports treating the first puff as a clinical emergency necessitating intervention to prevent long-term smoking. However, busy clinicians need to quickly and accurately identify youth who would benefit most, because counseling all patients is not feasible or necessary.
We developed a 12-item prognostic tool to identify at-risk youth, with the following 7 important attributes: (1) it identifies at-risk youth, implicitly acknowledging that the first puff is a sentinel event that can rapidly lead to ND symptoms and sustained smoking39,40; (2) it capitalizes on a recent review on initiation predictors identified in high-quality longitudinal studies22; (3) it was based on a broad socioecological understanding of risk22 and therefore includes diverse predictors; (4) it was designed for use in clinical settings; (5) it can be easily self-administered using a computer or smart phone application before a clinical visit41; (6) it is short and easily interpretable; and (7) its predictive validity was established by using cutting-edge analytic approaches.
We could not locate an independent data set in which to assess performance of the tool with the same variables and a 1-year follow-up for initiation after the measurement of the predictors. Therefore, we divided the data into a training and test data set. Our validation suggests that the model performed satisfactorily outside the training data set, but its predictive validity remains to be established in external populations. By presenting the tool in this forum, we hope to lay the groundwork for its use and validation in the many clinical settings in which it could be deployed. We view this tool not as a static entity but as a way to address a gap in current clinical practice that can be iteratively improved over time. Future work should attempt to replicate the findings in an independent data set that measures the same variables in an adolescent population.
Our risk model shares similarities with that of Talluri et al42 who developed a 13-item model used to predict the 1-year risk of initiation using data from a prospective population-based sample of 1179 adolescents of Mexican descent. Their items tapped individual characteristics, the social environment, and broader social-environment factors. This model has not been tested in other youth populations. Our model places less emphasis on the broader environment and taps more into adolescent characteristics and behaviors that “promote participation in social situations in which access to and availability of cigarettes is increased”.43 Eight of the 12 predictors relate to stress, self-esteem, depression, and alcohol use, all of which are amenable to prevention. Self-esteem training,44 sensitization to the influence of tobacco advertising, rehearsal of refusal skills,41 watching 10 truth campaign ads,45 and using commitment contracts to delay smoking46 are strategies that may increase resilience to tobacco smoking.
The 3-item SSC index11,21 is widely used in research to identify youth at risk of becoming a smoker. Strong et al21 added a curiosity question to the original index11 improving sensitivity (79% from 62%) and reducing specificity (36% from 50%). However, this index was not developed for, nor has it been tested in clinical settings or include diverse factors reflective of the many causes of smoking. Indeed, our scenarios suggest that adolescents who are not drawn to smoking but live in a high-risk environment are at risk of initiation. This would likely not be captured by the SSC index because it relies solely on feelings about cigarettes. It may be prudent for researchers of future work to assess the predictive validity of these screening tools in head-on comparisons in the same setting.
Our prognostic tool may not generalize to other jurisdictions, especially if the prevalence of the items tapped differ importantly. A cutoff used to designate high or low risk depends on smoking prevalence. Our cutoff may only be meaningful if the adolescent smoking prevalence is ∼16% (as it is currently in Canada,47 which is slightly higher than in the United States).48 If smoking prevalence differs substantially, our model can still be used to provide guidance on the relative importance of each predictor and allow clinicians to flag adolescents with several risk factors in the model. Similarly, because the legal drinking age is 18, alcohol use is relatively frequent among adolescents in Quebec.49 The ability to discriminate high versus low risk using our cutoff may be compromised if adolescents do not drink to the same extent. However, as our scenarios illustrate, factors including self-esteem and having friends who smoke have higher impacts on the predicted probability than alcohol use. Thus, the tool can still be used to identify at-risk adolescents, even in populations with lower alcohol consumption.
Although these data were collected almost a decade ago, our systematic review on longitudinal studies22 suggested no changes over time in the prognostic value of these predictors. In addition, we are not aware of a more recent data set with as comprehensive a set of measured predictors of initiation,22 which is required to meet the latest recommendations for constructing a prognostic tool with acceptable performance.50,51 Increased understanding of youth at risk could impel the development of therapeutic toolkits to prevent or delay initiation. For example, older age had a strong protective effect (the longer the delay in initiation, the lower the probability of initiation). A no-smoking contract for the next year might hold promise, as would strategies used to navigate or avoid situations when cigarettes are present.
Electronic cigarette use was not measured and could not be incorporated in the prognostic tool, although evidence suggests that it is associated with an increased risk of cigarette smoking initiation among adolescents.52 Physicians choosing to use the prognostic tool to identify at-risk adolescents should, as part of a comprehensive clinical assessment, also inquire about electronic cigarettes and other forms of combustible and noncombustible tobacco. Further limitations included that subitems in psychological scales were likely correlated, which can adversely affect the performance of conventional variable selection techniques such as stepwise regression.33 We used the Bolasso algorithm,29 which combines Lasso and bootstrapping and addresses correlation between predictors. Because the exact time of initiation was not measured, we used pooled logistic regression rather than survival analysis, although both methods lead to similar estimates.53 It is unclear whether the correlation between intraindividual observations affected the performance of Bolasso, which assumes that observations are independent. However, ignoring the correlation between repeated measures usually affects the estimation of variances with a negligible impact on regression coefficients.54
We developed a 12-item prognostic tool that can be used to identify adolescents likely to initiate smoking in the next year. If the predictive ability is replicated in other settings, this tool can be used to help clinicians select who should be counseled and, because several items in the tool are amenable to prevention, how they should be counseled. The sensitivity of the tool combined with the potentially lethal consequences of smoking initiation underscore an urgent need for such tools in pediatric settings.
- Accepted August 20, 2018.
- Address correspondence to Marie-Pierre Sylvestre, PhD, Centre de Recherche du Centre Hospitalier de l’Université de Montréal, S03.458, 850 Rue St Denis, Montréal, Québec, Canada H2X 0A9. E-mail:
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: Supported by the Canadian Cancer Society (grant 010271, 017435). Dr Sylvestre is supported by a Chercheur–Boursier career award from the Fonds de Recherche du Québec–Santé. Dr O’Loughlin holds a Canada Research Chair in the Early Determinants of Adult Chronic Disease. The funders were not involved in the design or conduct of the study; collection, management, analysis, or interpretation of the data; or preparation, review, or approval of the manuscript.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no conflicts of interest to disclose.
COMPANION PAPER: A companion to this article can be found online at www.pediatrics.org/cgi/doi/10.1542/peds.2018-2298.
- Reidpath DD,
- Davey TM,
- Kadirvelu A,
- Soyiri IN,
- Allotey P
- Arrazola RA,
- Neff LJ,
- Kennedy SM,
- Holder-Hayes E,
- Jones CD; Centers for Disease Control and Prevention (CDC)
- Reid JL,
- Hammond D,
- Rynard VL,
- Burkhalter R
- Reid JL,
- Hammond D,
- Rynard VL,
- Madill CL,
- Burkhalter R
- Gervais A,
- O’Loughlin J,
- Meshefedjian G,
- Bancej C,
- Tremblay M
- Pbert L,
- Farber H,
- Horn K; Julius B. Richmond Center of Excellence Tobacco Consortium, et al
- Farber HJ,
- Walley SC,
- Groner JA,
- Nelson KE; Section on Tobacco Control
- Schauer GL,
- Agaku IT,
- King BA,
- Malarcher AM
- Shelley D,
- Cantrell J,
- Faulkner D,
- Haviland L,
- Healton C,
- Messeri P
- Ozer EM,
- Adams SH,
- Lustig JL, et al
- Steyerberg E
- Bach FR
- Tibshirani R
- Harrell FE
- Rothman KJ,
- Greenland S,
- Lash TL
- Yang D
- DiFranza JR,
- Rigotti NA,
- McNeill AD, et al
- Pbert L,
- Farber H,
- Horn K, et al; American Academy of Pediatrics, Julius B. Richmond Center of Excellence Tobacco Consortium
- Talluri R,
- Wilkinson AV,
- Spitz MR,
- Shete S
- Burkhalter R,
- Cumming T,
- Rynard V,
- Manske S
- US Department of Health Human Services
- Nanhou V,
- Ducharme A,
- Eid H
- Fitzmaurice GM,
- Laird NM,
- Ware JH
- Copyright © 2018 by the American Academy of Pediatrics