September 2016, VOLUME138 /ISSUE 3

Autism Screening With Online Decision Support by Primary Care Pediatricians Aided by M-CHAT/F

  1. Raymond Sturner, MDa,b,
  2. Barbara Howard, MDa,c,
  3. Paul Bergmann, MAd,e,
  4. Tanya Morrel, PhDc,
  5. Lindsay Andon, MPHf,
  6. Danielle Marks, MPH, MSWg,
  7. Patricia Rao, PhDh, and
  8. Rebecca Landa, PhDh
  1. aDepartment of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, Maryland;
  2. bCenter for Promotion of Child Development through Primary Care, Baltimore, Maryland;
  3. cTotal Child Health, Baltimore, Maryland;
  4. dPrairieCare Institute, Minneapolis, Minnesota;
  5. eForesight Logic, Inc, St Paul, Minnesota;
  6. fPopulation Health Research, Johns Hopkins HealthCare, Baltimore, Maryland;
  7. gWoman and Infant Health Program, Wyoming Department of Health, Cheyenne, Wyoming; and
  8. hKennedy Krieger Institute, Baltimore, Maryland
  1. Drs Sturner and Howard conceptualized and designed the study, oversaw data collection during the study, and drafted the initial manuscript; Mr Bergmann conducted the statistical analyses, and reviewed and revised the manuscript; Dr Morrel, Ms Andon, and Ms Marks oversaw data collection, suggested protocol revisions, and assisted with manuscript preparation and review; Drs Rao and Landa contributed to study design, oversaw collection of criterion testing, and reviewed and revised the manuscript; and all authors approved the final manuscript as submitted.


BACKGROUND AND OBJECTIVE: Autism spectrum disorders (ASDs) often go undetected in toddlers. The Modified Checklist for Autism in Toddlers (M-CHAT) With Follow-up Interview (M-CHAT/F) has been shown to improve detection and reduce over-referral. However, there is little evidence supporting the administration of the interview by a primary care pediatrician (PCP) during typical checkups. The goal of this study was to evaluate the feasibility, validity, and reliability of the M-CHAT/F by PCPs with online prompts at the time of a positive M-CHAT screen.

DESIGN Forty-seven PCPs from 22 clinics completed 197 M-CHAT/Fs triggered by positive M-CHAT screens via the same secure Web-based platform that parents used to complete M-CHATs before an 18- or 24-month well-child visit. A second M-CHAT/F was administered live or by telephone by trained research assistants (RAs) at the Kennedy Krieger Institute Center for Autism and Related Disorders. The Autism Diagnostic Observation Schedule, Second Edition, and the Mullen Scales of Early Learning were administered as criterion measures. Measures of agreement between PCPs and RAs were calculated, and measures of test performance compared.

RESULTS: There was 86.6% agreement between PCPs and RAs, with a Cohen’s κ of 0.72. Comparison of sensitivity, specificity, positive predictive value (PPV), and overall accuracy for M-CHAT/F between PCPs and RAs showed significant equivalence for all measures. Use of the M-CHAT/F by PCPs resulted in significant improvement in PPV compared with the M-CHAT alone.

CONCLUSIONS: Minimally trained PCPs can administer the M-CHAT/F reliably and efficiently during regular well-child visits, increasing PPV without compromising detection.

  • Abbreviations:
    Center for Autism and Related Disorders
    Autism Diagnostic Observation Schedule
    autism spectrum disorder
    Modified Checklist for Autism in Toddlers
    Modified Checklist for Autism in Toddlers Follow-up Interview
    positive predictive value
    Mullen Scales of Early Learning
    primary care pediatrician
    autism center research assistant
    2 one-sided tests of equivalence
  • What’s Known on This Subject:

    The widely used Modified Checklist for Autism in Toddlers autism screen now has a requirement for a structured follow-up clinician interview for positive parent reports. However, its feasibility and accuracy have not been determined when used by primary care pediatricians during routine health supervision visits.

    What This Study Adds:

    The Modified Checklist for Autism in Toddlers Follow-up Interview is feasible for implementation by primary care pediatricians aided by electronic decision support during routine well-child care, yielding results that are as accurate as, and timelier than, those produced by specially trained clinicians.

    Current prevalence estimates suggest that autism spectrum disorder (ASD) now affects 1 in 68 children.1 Early, evidence-based intervention for children with ASDs is associated with improved developmental functioning and reduction of symptoms.2,3 Data suggest intervention before 3 years of age has the greatest impact,4,5 making early detection vital.

    Guidelines from the American Academy of Pediatrics call for use of ASD-specific screening tools at both the 18- and 24-month checkup visits.6 Although the age at first diagnosis seems to be decreasing,7 recent assessments show the average age is still >4 years.1 As screening becomes more widespread, over-referrals could tax system capacity for timely evaluation and early intervention with false-positive cases.

    The Modified Checklist for Autism in Toddlers (M-CHAT)8 is one of the most widely used autism-specific screens.9,10 It is among the tools approved for ASD screening by the American Academy of Pediatrics6 and advocacy groups, such as Autism Speaks.11 The M-CHAT has replaced the original Checklist for Autism in Toddlers, which demonstrated adequate specificity but inadequate sensitivity.12 Initial studies of the M-CHAT suggested that the modifications improved sensitivity but required a follow-up telephone interview for positive screens to clarify and/or correct parental responses to reduce over-referral rates.8

    Another study in community practices reported that this follow-up procedure increased the positive predictive value (PPV) from 0.11 to 0.65, reinforcing the need for the follow-up interview.13 The interview consists of a script that prompts for specific examples of behaviors relevant to each failed item. Based on the parent’s responses, the interviewer may change failed items to a pass.14 According to the M-CHAT authors, the follow-up interview is required for most failed screens and “…the use of the M-CHAT without the follow-up interview is not recommended in low risk populations…” such as patients in primary care settings.15 The M-CHAT authors have also noted that when the M-CHAT score is ≥8, the potential for false-positive screens in the absence of the follow-up interview is low enough that it can be omitted.16

    The M-CHAT is therefore intended to be implemented as a 2-stage screening test consisting of completion by the parent and, if positive, conducting the follow-up interview to clarify/correct failed items. This process is referred to as the Modified Checklist for Autism in Toddlers Follow-Up Interview (M-CHAT/F). The M-CHAT/F authors also suggest that the second stage of the procedure has clinical advantages in facilitating discussion between parents and providers about a child’s behaviors that may indicate ASD or other developmental delays.16

    One limitation of the published research is that administration of the follow-up interview was almost exclusively completed by trained research assistants (RAs) during a telephone call completed an average of 2 or 3 months after the parent completed the M-CHAT8,13,1719; 1 study indicated that “most” of the follow-up interviews were conducted later by telephone, with an unspecified number conducted by primary care physicians.15 Another study documented use of the follow-up interview by a clinician during the same visit in which the M-CHAT was completed by the parent.20 However, that particular 60-minute nurse/clinician visit occurred when the child was 2.5 years old.

    It is therefore unclear whether the recommended follow-up interview can be completed during briefer well-child visits by a primary care pediatrician (PCP) or if a telephone follow-up interview conducted by a specially trained interviewer is required.

    Will PCPs find time for this follow-up interview and, if so, will they use the written interview script and adhere to the standardized wording reliably? Or, will PCPs (who do not traditionally use structured interviews) spontaneously formulate follow-up questions using a more liberal interpretation of the interview?

    It is possible that parents, interviewed by telephone at a later time, may have had an opportunity to more accurately observe behaviors about which they were initially uncertain or the child’s development may have changed during the interval. The results of a telephone follow-up interview might therefore differ from 1 completed the day of the positive screen even if both interviews were completed in the same way.

    The goal of the present study was to determine whether PCPs will actually complete a follow-up interview to a positive M-CHAT during a routine check-up visit and how the results compare with a telephone follow-up interview later conducted by a trained RA. We reasoned that automated computer presentation of the exact M-CHAT/F questions for each failed M-CHAT item, as used here, and with scoring efficiencies, could facilitate the process.



    PCPs screened children at routine 18- and 24-month visits using a version of the M-CHAT completed by parents online either at home or on a device in the waiting room via a Web system (ie, the Child Health and Development Interactive System).21 The M-CHAT was automatically scored and, if positive, the PCP completed the follow-up interview also using the Child Health and Development Interactive System during the visit. The flow of the interview was determined by using the failed M-CHAT questions, with the system rescoring after each initially failed item was clarified, and concluding when the initial screening result was either confirmed positive or reversed by the results of the follow-up interview. No extra time was allotted for these visits. Data collection occurred between September 2009 and February 2013.

    Forty-seven self-selected pediatricians from 22 offices used the M-CHAT/F. These Maryland practices were mostly (73%) suburban, with some rural (18%) and urban (9%) locations. Practice estimation of office-level demographic characteristics showed that 31% of children were Medicaid insured (range, 5%–65%) with 39% white, 33% African American, 16% Asian, and 8% Hispanic. Of the total 5071 children screened with the M-CHAT, 341 (6.7%) were positive. The follow-up interview was completed by the PCP with parents of children with a positive M-CHAT (Fig 1 provides a sample flow). Online access to the M-CHAT/F was available for all M-CHAT–positive findings for all offices for all children.

    FIGURE 1

    Sample flow. M-CHAT, screen completed by parent; M-CHAT/F, interview conducted by PCP; AC M-CHAT/F, M-CHAT/F interview conducted by AC.

    Diagnostic Testing

    All children who initially screened positive on the M-CHAT were recruited for diagnostic testing at the Kennedy Krieger Institute Center for Autism and Related Disorders (Autism Center [AC]). M-CHAT results were monitored by research staff under a Health Insurance Portability and Accountability Act agreement and office approval. All children with positive M-CHAT screens (once the M-CHAT/F was complete) were offered a full evaluation blinded to all M-CHAT/F results without any PCP input apart from the M-CHAT/F itself. Consent was obtained using a protocol approved by The Johns Hopkins University School of Medicine Institutional Review Board, with initial screening considered standard of care requiring no separate consent. The Conflict of Interest Committee required that those individuals responsible for recruiting subjects, the diagnostic testers, and the statistician be conflict free. There was no charge for the diagnostic testing; parents received a $100 subject fee. Follow-up interviews were administered by trained RAs by telephone at the time appointments were scheduled for diagnostic testing. Evaluators were naive to other test results, and none of the AC staff knew results of the PCPs’ follow-up interviews.

    A total of 99 parents consented (29% of initial positive screens) and completed all relevant evaluation measures; 1 full ASD evaluation could not be completed because of “untestable” behavior.

    Diagnostic testing included administration of the Autism Diagnostic Observation Schedule (ADOS)22 by certified research-reliable speech and language pathologists. ADOS is a semi-structured behavior observation assessment of social and communication skills. It is recognized as the gold standard test for diagnosis of ASDs across age, developmental level, and language skills. The assessment relies on a series of planned occasions designed to elicit specific social and communicative processes in a standardized way. According to the author of the test, the criterion for diagnosis of autism in this study was the overall clinical diagnosis23 of an experienced and reliable ADOS tester who also conducted a standard developmental assessment (Mullen Scales of Early Learning [MSEL]). Data collection for this study was completed before availability of the Autism Diagnostic Observation Schedule, Second Edition, toddler module, and before the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, redefinition of autism and social communication disorder. Testers provided data on those children judged to show some atypical features but not enough for an ASD diagnosis, in a category of “suspected ASD/new phenotype.”

    The MSEL24 was also administered during diagnostic testing. The MSEL is a standardized developmental test for children 3 to 69 months of age. It has 5 subscales: gross motor, fine motor, visual reception, receptive language, and expressive language. An Early Learning Composite score is generated from the fine motor, receptive language, expressive language, and visual reception scale scores. The MSEL is used to determine the expected levels of communicative and social functioning for comparison with the ADOS results, as well as to identify children with developmental delays.


    Demographic characteristics of patients and respondents were tabulated. Potential selection bias was evaluated by using t tests and 2 one-sided tests of equivalence (TOST)25,26 of mean M-CHAT scores. Measures of test performance of MCHAT/F using PCP and AC follow-up interviews were calculated and compared with 2 proportion z tests and TOST.25,27 Combining the results of the t test/z test and TOST enables us to make an inference regarding significant difference or equivalence and whether significant differences are clinically/practically relevant.26,27 A general index of overall percent agreement and Cohen’s κ28 were used to measure the degree to which PCP follow-up interviews agreed with AC follow-up interviews.


    The demographic description of the subjects completing diagnostic testing with complete data is given in Table 1. The mean age of participating children was 22.9 months. All but 4 of the 98 children in the study were in the recommended age range; one was younger (14.7 months), and 3 were older (37, 37, and 40.8 months). As typical for screen-positive children, 74.5% were male.15 Respondents were nearly always mothers (89.8%) and had at least some college education (88.8%). Tables 2 and 3 present comparisons of mean M-CHAT scores according to level of study participation and ASD diagnosis. Among M-CHAT failures, M-CHAT scores were equivalent between those who received the M-CHAT/F and those who did not. Children with completed M-CHAT/Fs who were not included in the study had an average of 1 fewer failed M-CHAT item than those included. However, mean scores were not meaningfully different across study participation within the M-CHAT/F pass and M-CHAT/F fail subgroups.

    TABLE 1

    Demographic Characteristics of Final Sample (N = 98)

    TABLE 2

    Comparison of M-CHAT Scores According to Level of Study Participation

    TABLE 3

    Comparison of M-CHAT Scores of Study Cases According to ASD Diagnosis

    Overall mean M-CHAT scores for those who were ASD-positive were higher than for those who were ASD-negative. However, differences in mean scores diminished when comparing ASD-positive cases versus ASD-negative cases among those positive according to the M-CHAT/F. The same is true for those labeled as M-CHAT/F negative (Table 3).

    Table 4 presents measures of performance of the M-CHAT and M-CHAT/F against the final AC ASD diagnosis. The PPV of the M-CHAT was 0.40. Use of the M-CHAT/F by PCPs significantly improved the performance of the screen, producing a PPV of 0.58 (P = .01). When comparing the M-CHAT/F results of PCPs versus those of the RAs, we observed moderately high agreement (86.6%) with a κ of 0.72 (P < .001). PPV, sensitivity, specificity, and overall accuracy were all statistically equivalent between PCPs and RAs based on the combined results of the z tests and TOST.

    TABLE 4

    Performance of M-CHAT and M-CHAT/F Screens Predicting ASD Diagnosis

    Because children with false-positive screens may have atypical features but not enough for an ASD diagnosis, we included children with suspected ASD/new phenotype but no ASD diagnosis to form a combined outcome (Table 5). Differences in PPV between the M-CHAT/F and M-CHAT remained significant when including the “suspect” criterion (0.78 vs 0.57). Because children with false-positive screens for ASD may nevertheless have developmental difficulties that would benefit from early intervention, we examined prediction of a combined criterion of scores >1.5 SD below the mean on ≥2 Mullen scales or >2 SDs on any 1 Mullen scale (a common state eligibility criterion29) and/or receiving an ASD diagnosis (Table 6). Using this combined criterion increases the PPV to 0.88 for M-CHAT/F and 0.68 for M-CHAT, reducing the numbers of M-CHAT/F false-positive screens by 71%. Including both developmental failures and “suspected ASD” cases improved the PPV to 0.90 with M-CHAT/F and 0.77 for M-CHAT (Table 7). PPV, sensitivity, specificity, and accuracy of the M-CHAT/F were all statistically equivalent between PCPs and RAs.

    TABLE 5

    Performance of M-CHAT and M-CHAT/F Screens Predicting ASD or Suspected ASD Diagnosis

    TABLE 6

    Performance of M-CHAT and M-CHAT/F Screens Predicting ASD and/or Developmental Delay Diagnosis

    TABLE 7

    Performance of M-CHAT and M-CHAT/F Screens Predicting ASD or Suspected ASD Diagnosis and/or Developmental Delay Diagnosis

    Mean ADOS and Mullen scores are presented in Tables 8 and 9 according to screening result and the overall diagnostic sample. Detailed differences in the 13 cases for whom the M-CHAT/F results differed between the 2 raters are described in Tables 10, 11, and 12. Three of 7 children who screened positive on the PCP M-CHAT/F and negative on the AC M-CHAT/F were diagnosed as having ASD; of the 6 children who screened positive on the AC M-CHAT/F but negative on the PCP M-CHAT/F, only 1 child was diagnosed with ASD.

    TABLE 8

    Comparison of Mean ADOS Scores

    TABLE 9

    Comparison of Mean Mullen T Scores

    TABLE 10

    Mean ADOS Scale Scores According to M-CHAT/F Mismatch Category

    TABLE 11

    Mean Mullen T Scores According to M-CHAT/F Mismatch Category

    TABLE 12

    Diagnoses According to M-CHAT/F Mismatch Category


    This study confirms the previous finding of improved PPV of the M-CHAT8 using the M-CHAT/F.13 It also confirms previous studies showing that most children with false-positive screens have developmental difficulties17 of a degree that would make them eligible for early intervention. Some children with false-positive screens had atypical features not meeting criteria for ASD.

    The online M-CHAT/F enabled PCPs to clarify positive parent responses to M-CHAT items during well-child visits, rather than requiring another visit or call by a trained interviewer. This study found that the performance of the M-CHAT/F by a PCP was equivalent to one administered by trained AC staff. This report is the first demonstrating feasibility of administration of the M-CHAT/F during the time of well-child visit in community practices.

    The PCPs in this study received only brief orientation and access to a 10-minute interactive Web-based demonstration, considerably less than previous studies providing 1.5 days of training in autism screening.20

    Although time-motion data were not collected, PCP visits were not scheduled to be longer than that of their colleagues. The process of clarifying initial parent M-CHAT responses prompted by the online follow-up interview would be expected to require less time than discussing autism referrals with parents of children who screened positive. In addition, MCHAT/F–negative families avoided the distress of waiting for a diagnostic evaluation.

    We agree with the M-CHAT authors that the M-CHAT/F provides clinical utility by facilitating discussion with parents regarding their child’s behaviors and potential developmental challenges.15 It is important to use a validated method, such as the M-CHAT/F, to ascertain whether the parent is reporting behaviors consistent with signs of ASD or has misunderstood the questions.

    Once a child has been identified as at risk for ASD through screening, challenges remain for persuading the family that a diagnostic evaluation is necessary.30 In addition to reducing false- positive findings, use of the follow-up interview facilitates a structured clarification process and discussion of parent perceptions that led to the positive screen, encouraging parents to follow through with a referral for further evaluation.

    This study differed from that of the M-CHAT authors by including M-CHAT/F negative findings as well as positive findings for full evaluation. This study also found that even with trained M-CHAT/F interviewers, one cannot assume that all children with negative interview results do not have autism; even the recommended 2-stage procedure missed cases. Further study of the sensitivity of the M-CHAT in community populations with systematic evaluation of M-CHAT–negative cases and M-CHAT/F–negative cases is needed to fully evaluate the cost/benefit of various toddler screening strategies.

    Further research is needed to determine the types of conversation, education, and monitoring required to ensure timely referral to early intervention. Successful referral begins with a shared understanding of how the child’s behaviors relate to a potentially serious, although sometimes subtle, condition requiring attention. Although discussion could be informal, the validated, structured (and billable) M-CHAT/F provides evidence-based documentation for or against a serious condition. Because most positive M-CHAT screens are false-positive findings, a positive M-CHAT/F screen can help clinicians be more persuasive about referral, knowing its higher likelihood of significance.

    We therefore disagree with the authors of the M-CHAT/F who advise that a M-CHAT total score ≥8 does not require the follow-up interview.16 Thoughtful review of parental perceptions of their child’s behaviors for these 1% of cases using M-CHAT/F acknowledges the importance of parental observations, supports the need for referral, and activates follow-through. Use of a computerized algorithm to guide the follow-up interview also shortens the interview for those endorsing large number of behaviors because the process stops automatically when results either confirm or refute the initial M-CHAT screen result.

    One limitation of this study, shared by other studies, is that children passing the M-CHAT were not systematically recruited and, of those completing the M-CHAT/F who were recruited, many declined full evaluation. However, unlike the initial M-CHAT studies, this sample was derived completely from a community, primary care population with a participation rate similar to the latest M-CHAT community sample.16 This limitation is not restricted to research studies, as similar suboptimal rates of successful referral from screening occur in practice.30 The requirement to travel to an AC may have been an impediment because our subsequent study offered in-home assessments and had higher participation rates.

    The proportion of African-American children in the evaluation sample was only one-third of the proportion of African-American subjects estimated in the overall clinic population. Because we did not have individual demographic data for children who did not enroll in the evaluation, we do not know if the disparity was at the level of screening, screening positive, or study enrollment. The proportion of African-American subjects in the clinics used, however, was based on clinician estimates. Known disparities in age of diagnosis in African-American children31 make this factor important to understand.

    This study was conducted before publication of the revised M-CHAT16; the original 23-item version was used. The revised M-CHAT/F contains slight wording changes and eliminates 3 items. Due to the follow-up algorithm, we could not reanalyze our data as though the revised M-CHAT was used.

    The authors of the M-CHAT assert that the follow-up interview is a requirement for its use in well-child visits.16 This study provides the first evidence that PCPs can do so feasibly and reliably. Although an online system is not necessary, it provides important efficiencies. Adherence to formally structured interview protocols is not the tradition of PCPs. Electronic decision support may provide the structure needed when adherence to an algorithm is appropriate and clinically desirable.


      • Accepted June 17, 2016.
    • Address correspondence to Raymond Sturner, MD, Center for Promotion of Child Development through Primary Care, 6017 Altamont Place, Baltimore, MD 21210. E-mail: rsturner{at}
    • FINANCIAL DISCLOSURE: This study was conducted by the Center for Promotion of Child Development through Primary Care and its for-profit subsidiary, Total Child Health (TCH), Inc. The Child Health and Development Interactive System (the Web tool used in the study) was developed by Dr Sturner and his spouse, Dr Howard. Dr Sturner is Director of the Center and Dr Howard is President of TCH. Both are members of the Board of Directors of both entities and are paid consultants to both entities. Mr Bergmann has consulted for TCH through his company Foresight Logic but received no funding for this study. Dr Morrel is an employee of TCH and a stockholder in the company. The other authors have indicated they have no financial relationships relevant to this article to disclose.

    • FUNDING: All phases of this study were supported by the National Institute of Mental Health grant R44MH085399. Funded by the National Institutes of Health (NIH).

    • POTENTIAL CONFLICT OF INTEREST: This study was conducted by the Center for Promotion of Child Development through Primary Care and its for-profit subsidiary, Total Child Health (TCH), Inc. The Child Health and Development Interactive System (the Web tool used in the study) was developed by Dr Sturner and his spouse, Dr Howard. Dr Sturner is Director of the center and Dr Howard is President of TCH. Both are members of the Board of Directors of both entities and are paid consultants to both entities. Dr Morrel is an employee of TCH and a stockholder in the company. The other authors have indicated they have no potential conflicts of interest to disclose.