Published online October 1, 2008
PEDIATRICS Vol. 122 No. 4 October 2008, pp. 866-868 (doi:10.1542/peds.2007-3142)
This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow E-mail this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My File Cabinet
Right arrow Download to citation manager
Right arrowRequest Permissions
Citing Articles
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Marks, K.
Right arrow Articles by Squires, J. K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Marks, K.
Right arrow Articles by Squires, J. K.
Related Collections
Right arrow Office Practice
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Facebook   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

COMMENTARY

The Thorny Nature of Predictive Validity Studies on Screening Tests for Developmental-Behavioral Problems

Kevin Marks, MDa, Frances Page Glascoe, PhDb, Glen P. Aylward, PhDc, Michael I. Shevell, MDd, Paul H. Lipkin, MDe,f and Jane K. Squires, PhDg

a Department of Pediatrics, PeaceHealth Medical Group, Eugene, Oregon
b Department of Pediatrics, Vanderbilt University, Nashville, Tennessee
c School of Medicine, Southern Illinois University, Springfield, Illinois
d Departments of Neurology/Neurosurgery and Pediatrics, McGill University, Montreal, Quebec, Canada
e Division of Neurology and Developmental Medicine, Kennedy Krieger Institute, Baltimore, Maryland
f Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, Maryland
g College of Education, University of Oregon, Eugene, Oregon

Over the last few years, several researchers have focused on the predictive validity of developmental-behavioral screening tools.111 Predictive validity studies compare the results of a screening test administered at a single point in time (referred to hereafter as time 1) to the results of a diagnostic test or battery administered 3 months to several years later (time 2).12 Unlike concurrent validity studies in which both screening and diagnostic measures are administered at the same time to determine the sensitivity and specificity of a screen, predictive validity studies depend on longitudinal measurement and focus on how well a screen predicts future developmental status. As a consequence, predictive validity research offers a critical illustration of whether a screening test measures dimensions of development that are enduring and have a meaningful impact on children's long-term outcomes. As already stated by the American Academy of Pediatrics, a concerning, high-quality screening result should generate a referral to early intervention or special education, which has been shown to improve a child's developmental, behavioral, and/or school-readiness trajectory.9, 1317

Nevertheless, predictive validity studies on screening tests are also fraught with challenges, particularly because young children change rapidly. For example, at 18 months some children may not be talking much. By 24 months we expect much more. Some children who seemed fine at 18 months have begun to have trouble (eg, combining words). Others will have had the benefit of intervention and overcome earlier deficits. The adverse impact of psychosocial risk becomes more apparent with age. Developmental functions can be emergent, latent (not yet measurable), delayed, deficient, or disordered.13 Measuring the moving target that is child development is not impossible, but it is one of the reasons that professional organizations such as the American Academy of Pediatrics recommend ongoing surveillance at each well-child visit along with periodic screening using high-quality measures. Early intervention programs respond to referrals on the basis of screening tests with additional testing, intervention when indicated, and, if not, then often with ongoing monitoring, in full recognition that a child's need for assistance often changes.

Accordingly, we, all of us screening test researchers and authors, encourage our colleagues to deploy skill and insight when conducting predictive validity studies and, thus, recommend the following:

  1. Account for intervening variables (ie, what happens to children during the interval between screening and subsequent administration of a diagnostic measure). Did these variables alter the child's developmental and/or behavioral trajectory? Do these variables differ significantly between the population screened and not screened? This accounting should include documentation of both medical and psychosocial interventions at both times 1 and 2. Intervening medical conditions and treatments (including recovery) occurring between times 1 and 2 should be documented (eg, iron-deficiency anemia, obstructive sleep apnea, anticonvulsant therapy). Other processes that might have enhanced development in the interim between measurement should be considered (eg, whether parents were given suggestions for developmental promotion activities at time 1, whether by time 2 they implemented those suggestions, and whether families enrolled, in the interim, in any among a range of intervention services [eg, housing assistance, academic tutoring, speech-language therapy, parenting classes, etc]). Developmental and/or behavioral screens are unlike many medical screens (eg, testing for blood lead levels). They may have an "observational effect." Parent-based questionnaires can serve as a teaching tool when the parent is thoughtfully filling out the answers. They may alter clinicians' conversation or actions during or after the visit. Most high-quality screening toolkits come with interventional parent handouts and/or activity sheets.
  2. Ensure that the criterion battery is of good quality. In selecting diagnostic tests, attention should be paid to whether they have recent standardization (preferably in the last 10 years) on a large, nationally representative sample, assess a broad range of developmental skills (preferably via domain scores so that strengths and weaknesses in screening test performance can be viewed), have been validated against other high-quality diagnostic tests so that measurement strengths and weaknesses are thoroughly identified, and have proven levels of various kinds of reliability (eg, test-retest, interrater, internal consistency). Screens are usually broadband, and the reference measures need to be also. Criterion tests should also focus on outcomes and, thus, include measuring critical variables such as school performance, in-grade retention, enrollment in special services, graduation rates, etc. Nevertheless, selecting the criterion battery will always be a thorn on the stem of developmental-behavioral research, because there is no truly perfect "rose" to serve as the reference or gold standard.
  3. Administer both the criterion battery and the screening measure at times 1 and 2, which provides an indicator of developmental stability for each child and for each test. The resulting data serve as useful covariants in accounting for growth (or lack thereof) and provide valuable guidance on how well both screening and reference measures account for expected developmental changes. Such study design also provides valuable real-world information on children's progress and whether the screening test under study has the capacity to predict future eligibility for services. Focusing on long-term outcomes in later childhood or adulthood affords a comparison between the results of the screen plus or minus criterion battery at time 1 and diagnostic testing at time 2. Two challenges remain, though. The first challenge is that in some cases the criterion battery may require different measures at each point in time. Some tests have a limited age range, and others must be selected in their stead. Nevertheless, the need for other measures that define outcomes offers helpful variability but also potency in confirming conclusions. The second, and more important, challenge is that problematic screening/diagnostic testing results in early childhood often trigger a number of interventions. Long-term predictive validity research, again, has a thorn: testing can lead to altered outcomes.17 Research instead should focus on how and when screens can better identify medical and/or developmental conditions or disorders that are responsive to treatment(s). Investigating which populations respond best to which modalities of early intervention or special education is a more direct approach to improving long-term outcomes.
  4. Thoroughly analyze the data set. If prediction from a screen to a diagnostic measure does not meet standards for concurrent accuracy (typically, sensitivity and specificity of ≥70% to diagnostic measures administered along with a screen at time 1), determine which domains or items on a screen performed well. Gross motor performance, for example, may not have strong predictive validity, but receptive language performance may well be a long-term indicator of success or problems. Because stable performance is characteristic of children with severe disabilities, when assessing those with potentially milder problems, consider which cutoffs on the diagnostic measure (usually set to various SDs below the mean) best capture the results of a screening test. Another worthwhile approach is to apply to the criterion battery, criteria used to determine eligibility for early intervention and special education, because it helps ensure that the research findings have ecological validity.
  5. Appreciate findings in which screening test results predict the majority of diagnostic test performance at time 2, even if less than desired for concurrent accuracy. Given that screening measures are brief by definition, they include only a few items at any 1 age level. Because development is dynamic and developmental problems evolve, it is impressive that any screening results at time 1 identify the majority of children with and without difficulties at time 2. Most predictive validity study results on diagnostic measures (intelligence tests, educational batteries, etc) are expressed as correlations (an effect size) or percentage of variance accounted for. For clinical decision-making on the basis of screening tests, such reporting is understandably less than satisfactory. Computing other tests of effect size, such as odds ratios, offers an alternative, as does tolerance, if not admiration, of sensitivity/specificity figures for predictive validity studies that may be <70% but are, nevertheless, much greater than chance.


    CONCLUSIONS
 TOP
 CONCLUSIONS
 REFERENCES
 
Even while we express consternation about how findings have been interpreted in several recently published predictive validity studies of screening tests, all such research is illustrative of the complexities of measuring child development. Because screening tests are designed to identify current problems so that they can be addressed as early as possible, we encourage researchers to interpret their predictive validity findings in a more positive light. Above all, we urge clinicians to value strong relationships between screens and diagnostic measures.

In response to a screening test failure, referral to an early intervention agency is the first step. A diagnosis is not required. At the same time, medical providers should also make use of a concerning screening result to provide more diligent surveillance, possibly with supplemental screening, and/or a pediatric subspecialist referral, per the wise recommendations of the American Academy of Pediatrics.14


    FOOTNOTES
 
Accepted Jun 20, 2008.

Address correspondence to Frances Page Glascoe, PhD, 25 Bragg Dr, East Berlin, PA 17316. E-mail: frances.p.glascoe{at}vanderbilt.edu

The authors have indicated they have no financial relationships relevant to this article to disclose.

Opinions expressed in these commentaries are those of the author and not necessarily those of the American Academy of Pediatrics or its Committees.


    REFERENCES
 TOP
 CONCLUSIONS
 REFERENCES
 

  1. Wake M, Gerner B, Gallagher S. Does parents' evaluation of developmental status at school entry predict language, achievement, and quality of life 2 years later? Ambul Pediatr. 2005;5 (3):143 –149[CrossRef][Web of Science][Medline]
  2. Hess CR, Papas MA, Black MM. Use of the Bayley Infant Neurodevelopmental Screener with an environmental risk group. J Pediatr Psychol. 2004;29 (5):321 –330[Abstract/Free Full Text]
  3. Harris SR, Daniels LE. Reliability and validity of the Harris Infant Neuromotor Test. J Pediatr. 2001;139 (2):249 –253[CrossRef][Web of Science][Medline]
  4. Leonard CH, Piecuch RE, Cooper BA. Use of the Bayley Infant Neurodevelopmental Screener with low birth weight infants. J Pediatr Psychol. 2001;26 (1):33 –40[Abstract/Free Full Text]
  5. Aylward GP, Verhulst SJ. Predictive utility of the Bayley Infant Neurodevelopmental Screener (BINS) risk status classifications: clinical interpretation and application. Dev Med Child Neurol. 2000;42 (1):25 –31[CrossRef][Web of Science][Medline]
  6. Klee T, Carson DK, Gavin WJ, Hall L, Kent A, Reece S. Concurrent and predictive validity of an early language screening program. J Speech Lang Hear Res. 1998;41 (3):627 –624[Abstract/Free Full Text]
  7. Sturner RA, Funk SG, Green JA. Preschool speech and language screening: further validation of the sentence repetition screening test. J Dev Behav Pediatr. 1996;17 (6):405 –413[CrossRef][Web of Science][Medline]
  8. Rydz D, Srour M, Oskoui M, et al. Screening for developmental delay in the setting of a community pediatric clinic: a prospective assessment of parent-report questionnaires. Pediatrics. 2006;118 (4). Available at: www.pediatrics.org/cgi/content/full/118/4/e1178
  9. McCormick MC, Brooks-Gunn J, Buka SL, et al. Early intervention in low birth weight premature infants: results at 18 years of age for the Infant Health and Development Program. Pediatrics. 2006;117 (3):771 –780[Abstract/Free Full Text]
  10. van Agt HME, van der Stege HA, de Ridder-Sluiter H, Verhoeven LTW, de Koning HJ. A cluster-randomized trial of screening for language delay in toddlers: effects on school performance and language development at age 8. Pediatrics. 2007;120 (6):1317 –1325[Abstract/Free Full Text]
  11. Briggs-Gowan MJ, Carter AS. Social-emotional screening status in early childhood predicts elementary school outcomes. Pediatrics. 2008;121 (5):957 –962[Abstract/Free Full Text]
  12. Buck AA, Gart JJ. Comparison of a screening test and a reference test in epidemiologic studies: I. Indices of agreement and their relation to prevalence. Am J Epidemiol. 1966;83 (3):586 –592[Free Full Text]
  13. Capute AJ, Accardo PJ. A neurodevelopmental perspective on the continuum of developmental disabilities. In: Capute AJ, Accardo PJ, eds. Developmental Disabilities in Infancy and Childhood. 2nd ed. Vol 1. Baltimore, MD: Paul H. Brookes; 1996:1 –22
  14. American Academy of Pediatrics, Council on Children With Disabilities, Section on Developmental Behavioral Pediatrics; Bright Futures Steering Committee; Medical Home Initiatives for Children With Special Needs Project Advisory Committee. Identifying infants and young children with developmental disorders in the medical home: an algorithm for developmental surveillance and screening [published correction appears in Pediatrics. 2006;118(4):1808–1809]. Pediatrics. 2006;118 (1):405 –420[Abstract/Free Full Text]
  15. Shonkoff JP. From neurons to neighborhoods: old and new challenges for developmental and behavioral pediatrics. J Dev Behav Pediatr. 2003;24 (1):70 –76[Web of Science][Medline]
  16. Guralnick MJ. Effectiveness of early intervention for vulnerable children: a developmental perspective. Am J Ment Retard. 1998;102 (4):319 –345[CrossRef][Web of Science][Medline]
  17. Ramey CT, Ramey SL. Effective early intervention. Ment Retard. 1992;30 (6):337 –345[Web of Science][Medline]

PEDIATRICS (ISSN 1098-4275). ©2008 by the American Academy of Pediatrics

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Facebook Facebook   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?



This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow E-mail this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My File Cabinet
Right arrow Download to citation manager
Right arrowRequest Permissions
Citing Articles
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Marks, K.
Right arrow Articles by Squires, J. K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Marks, K.
Right arrow Articles by Squires, J. K.
Related Collections
Right arrow Office Practice
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Facebook   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?