Objectives. Autistic spectrum disorders (ASD) have variable developmental outcomes, for reasons that are not entirely clear. The objective of this study was to test the clinical observation that initial developmental parameters (degree of atypicality and level of intelligence) are a major predictor of outcome in children with ASD and to develop a statistical method for modeling outcome on the basis of these parameters.
Methods. A retrospective chart review was conducted of a child development program at a tertiary center for the evaluation of children with developmental disabilities. All children who had ASD, were seen by J.C. between July 1997 and December 2002, met Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) criteria for autism or pervasive developmental disorder (referred to hereafter as ASD), had undergone at least 1 administration of the Childhood Autism Rating Scale (CARS), and had at least 1 determination of developmental quotient (DQ) or IQ (N = 91) were studied. The sample was 92.3% male and 80.2% white.
Methods. The DSM-IV was used to confirm that each patient met criteria for a diagnosis of autism or pervasive developmental disorder. The CARS was used to quantify the severity of expression of ASD. Age at evaluation, CARS score, and DQ or IQ at each visit were extracted from the medical record. The 2 independent sample t test or the Mann-Whitney test was used for comparing CARS and age between 2 groups: first recorded DQ or IQ <0.70 (n = 58) versus first recorded DQ or IQ ≥0.70 (n = 33). Associations among CARS score, IQ or DQ, and age were examined using Pearson or Spearman correlation. A mixed-effect model was used for expressing the multivariate model. Length of follow-up (period) was calculated by subtracting age in months at initial evaluation from age in months at each follow-up evaluation. Therefore, at first evaluation, period = 0. Period was considered as a random effect because collection of repeated information from patients was not uniform. The predictive relationships among CARS, age at first evaluation, period, and DQ or IQ group (<0.70 and ≥0.70) were examined using a mixed-effects model. Variables that were expressed as percentage change between first and last measurements were analyzed using the t test or the Mann-Whitney test. Socioeconomic status was assessed using Hollingshead criteria.
Results. All patients met DSM-IV criteria for ASD. Mean age at initial evaluation was 46.2 months (SD: 23.7; range: 20.0–167.3 months). Mean CARS score at initial evaluation was 36.1 (SD: 6.3; range: 21.5–48). Mean DQ or IQ at initial evaluation was 0.65 (SD: 0.20; range: 0.16–1.10). There was no significant difference in socioeconomic status between DQ/IQ groups. CARS scores among children with an initial DQ or IQ <0.70 showed no significant decrement with time. In contrast, CARS scores among children with an initial DQ or IQ ≥0.70 showed a significant decrement with time, which could be modeled by the formula CARS = 37.93 − [(0.12 × age in months at first visit) + (0.23 × period)]. The predicted CARS scores generated by this model correlated with the observed values (r = 0.71) and explained 50% of the variability in the CARS scores for this group.
Conclusions. These data provide preliminary validation of a statistical model for clinical outcome of ASD on the basis of 3 parameters: age, degree of atypicality, and level of intelligence. This model, if replicated in a prospective, population-based sample that is controlled for treatment modalities, will enhance our ability to offer a prognosis for the child with ASD and will provide a benchmark against which to judge the putative benefits of various treatments for ASD. Our model may also be useful in etiologic and epidemiologic studies of ASD, because different causes of ASD are likely to follow different developmental trajectories along these 3 parameters.
Symptoms of autism and pervasive developmental disorder (PDD; referred to collectively as “autistic spectrum disorders”[ASD]) vary in severity from one child to another. People with ASD can also have widely different levels of intelligence.1–7 It has long been known that atypical features wane over time in some affected individuals. Kanner's 1943 paper that first described autism8 was itself a 5-year follow-up of a cohort of children whom he had been treating since 1938. Kanner noted gradual improvement in language and social skills in this group of children from preschool through middle childhood. These improvements occurred in the absence of any specific developmental intervention for autism. At long-term follow-up nearly 3 decades later, Kanner found that 1 of his original subjects had gone on to earn a college diploma, whereas others in the group remained in highly sheltered living situations.9 Other investigators have reported similar observations.5,10–16
The prognosis for individuals with ASD seems to be governed by the joint impact of the degree of atypicality and the level of overall intelligence, but the precise relationship among these parameters (degree of atypicality, level of intelligence, and symptom expression over time) has not been defined.17,18 The goals of the present investigation were to quantify our clinical impression that atypical features fade more rapidly among children with normal intelligence than among children with comorbid mental retardation and to determine whether there is a relationship among the variables age, degree of atypicality, and level of general intelligence that can be captured by a statistical model.
This investigation was performed at Children's Seashore House of The Children's Hospital of Philadelphia. Children's Seashore House is a regional referral center for children with a wide range of neurodevelopmental disabilities and behavioral disorders.
This was a retrospective chart review. Patient characteristics are summarized in Tables 1 and 2. Patients consisted of all children who were evaluated by one of us (J.C.) between July 1, 1997, and December 31, 2002, for whom age, level of cognitive development (expressed as IQ or DQ), and severity of atypical features as quantified by the Childhood Autism Rating Scale (CARS) existed in the medical record, and who met Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) criteria for ASD (Table 1). No patient was excluded on the basis of age, race, or gender. This study was approved by the Institutional Review Board for the Protection of Human Subjects of The Children's Hospital of Philadelphia. Known or suspected causes for ASD in this sample included fragile X syndrome (1), Down syndrome (1), basilar artery aneurysm (1), 47 XYY (1), and Turner syndrome (1). Patients include 1 pair of fraternal twin boys and 1 pair of singleton brothers. One patient had a sibling (not part of this study) with Asperger syndrome. One potential subject with comorbid deafness was excluded because of the potentially confounding effects of his deafness on developmental assessment. Patients ranged in age from 20 months to 13 years 11 months, although most were preschool or elementary school age (Table 2).
Eligibility was determined by scoring each child on the DSM-IV criteria for autism or PDD. If the child met criteria at any point during his or her clinical course, the child was considered eligible for this investigation. By definition, therefore, 100% of patients in this report met DSM-IV criteria for autism or PDD.
Charts for each eligible patient were reviewed. In addition to gender, ethnicity, and age, we extracted the following data for each clinic visit, when available in the medical record:
socioeconomic data (parents' education and occupation), classified according to Hollingshead (A.B. Hollingshead, PhD, Four-Factor Index of Social Status. unpublished manual, 1975);
score on the CARS;
results of standardized psychometric testing;
level of adaptive (self-care) skills; and
results of neurodevelopmental testing as performed by a board-certified neurodevelopmental pediatrician
Severity of ASD was expressed as the child's score on the CARS.19 The CARS is a standardized clinical observation tool that rates 15 clinical features (social relatedness, stereotypical language, repetitious behaviors, sensory phenomena, etc) from 1 (normal) through 4 (severely abnormal). Thus, scores on the CARS range from 15 (no signs of ASD) to 60 (severe ASD). Nearly all CARS scores were obtained by us (J.C.). However, we also compared our CARS scores with those of independent examiners when these were available in the medical record.
Level of general intelligence was expressed as IQ or DQ (DQ = [age equivalent of observed abilities/chronological age]).20 We relied on the results of standardized psychometric measures as recorded in the medical record, such as the Stanford Binet Intelligence Scale-Fourth edition,21 the Wechsler Preschool and Primary Scale of Intelligence-Revised,22 and the Wechsler Intelligence Scale for Children-III,23 whenever possible. However, children with ASD are frequently not testable by conventional methods. Under such circumstances, we based DQ estimates on the results of achievement testing with instruments such as the Hawaii Early Learning Profile,24 the level of the child's adaptive (self-care) skills as quantified by the Vineland Scales of Adaptive Behavior25 or the Scales of Independent Behavior-Revised,26 plus neurodevelopmental testing at the time of clinic evaluation. We disregarded gross motor skills when determining DQ because these do not correlate closely with cognitive ability. We disregarded speech delay when estimating DQ, because children with ASD have a selective impairment of speech and language, and our goal was to characterize children's cognitive abilities in domains that are not specifically impaired by ASD. Likewise, we disregarded splinter skills such as hyperlexia (ability to decode by rote printed matter well above functional use of language), because such skills may overestimate the child's true functional capability. We divided patients into 2 mutually exclusive groups (those whose first recorded DQ or IQ was <0.70 [N = 58] and those whose first recorded DQ or IQ was ≥0.70 [N = 33]) to test our clinical impression that children with DQ or IQ in the normal range show greater decrement in CARS scores over time than children with DQ or IQ in the mental retardation range.
The primary outcomes for this study were collected repeatedly over time. Summary statistics of continuous variables were examined and described by mean, median, SD, minimum, and maximum. Categorical variables (gender, race, and intelligence group) were presented by frequency distribution. The 2 independent sample t test or the Mann-Whitney test was used for the comparison of continuous variables such as CARS and age between 2 groups (eg, intelligence groups: IQ or DQ <0.70 vs IQ or DQ ≥0.70). Univariate and partial correlation (Pearson or Spearman) were used to examine the association among age, intelligence group, and score on the CARS. The effect of age and intelligence group on CARS levels across visits was analyzed using the linear mixed-effects models approach (SAS Institute, Inc, Cary, NC). This approach to data analysis is similar to the multiple regression analysis, but it takes into consideration the repeated measurements of data from the same patient that produce correlated observations. In addition, by adopting this method, we were able to include in the analysis all observed patients, even when some patients lacked a complete set of measurements (eg, all patients did not have a CARS and DQ or IQ determined at every visit). Models that predict CARS were obtained using PROC MIXED (SAS Institute, Inc). Patients and time of measurements were assumed to be random effects, and the structure of the variance covariance matrix was assumed to be unstructured and to be estimated from the data. Period was measured as age at current visit in months minus age at first visit in months. In this analysis, the type I error (α) was set to equal .05.
Occupational data were available for 69 (76%) of 91 families, and educational data were available for 52 (57%) of 91. There was no significant difference in parental education or occupation between DQ/IQ groups. The sample as a whole was generally well educated and prosperous: 80% of families with available data had at least 1 parent who had completed college or graduate school; 56% included at least 1 parent who was an executive or higher professional (physician, attorney, architect, etc).
The average DQ or IQ for the entire cohort was 0.65 (SD: 0.20; range: 0.16–1.10). Mean DQ or IQ for the low DQ/IQ group was 0.53 (SD: 0.12) and for the high DQ/IQ group was 0.86 (SD: 0.13). These DQ/IQ scores differed significantly (Table 2). There were no significant differences in racial or gender distribution, mean age at first evaluation, duration of follow-up, or number of examinations between patients with DQ or IQ <0.70 at first visit and patients with DQ or IQ ≥0.70 at first visit. Patients with DQ or IQ <0.70 had CARS scores that were significantly higher than those of patients with DQ or IQ ≥0.70. There were no significant relationships among DQ or IQ at first visit, age at first visit, and length of follow-up.
There were 9 instances in which we (J.C.) administered a CARS within 90 days of a CARS by another examiner (typically, the school psychologist). CARS scores by us correlated (r = 0.822; P = .006) with CARS scores that were obtained independently by other examiners.
The relationship between CARS and age was examined using SAS PROC MIXED. CARS scores were treated as the dependent variable; age at first visit (months), period (months), DQ or IQ group (DQ or IQ <0.70 = group 0; DQ or IQ ≥0.70 = group 1), and the interaction term period × DQ group were the independent variables. The fitted linear model for all patients (Table 3) was CARS = 36.65 − [(0.08 × age in months at first visit) + (0.24 × period)] + (4.11 × group) + (0.22 × period × group).
We then ran a similar analysis for each DQ or IQ group. For each group, CARS scores were treated as the dependent variable; age at first visit (months) and period (months) were the independent variables. The fitted linear model for patients with initial DQ or IQ < 0.70 was CARS = 40.23 − [(0.06 × age at first visit) + (0.02 × period)].
Only the intercept was statistically significant (P < .001). Neither age at first visit nor duration of follow-up (period) was significant (P = .27 and P = .56, respectively). That is, there was no significant relationship between CARS score and age at first visit and no significant change in CARS related to the length of follow-up (period).
The fitted linear model for patients with an initial DQ or IQ ≥0.70 was CARS = 37.93 − [(0.12 × age at first visit) + (0.23 × period)]. In contrast to the low intelligence group, the intercept, age at first visit, and period coefficients all were statistically significant (P < .001, 0.02, and <.001, respectively). That is, CARS scores for children with DQ or IQ ≥0.70 were inversely related to age at first visit and continued to decline with increasing length of follow-up. The predicted values generated by this model correlated (r = 0.71) with the observed CARS scores, indicating that the model explained 50% of the variability in CARS (Fig 1).
Atypical features in children with ASD diminish with the passage of time. However, this benefit is not enjoyed equally by all children with ASD. In our clinic sample, only children whose global cognitive ability (DQ or IQ) was ≥0.70 at entry showed significant abatement of their atypical features (as measured by the CARS) with time. There was no observed decrement in CARS scores for children whose initial DQ or IQ was <0.70. These results are consistent with the observations of previous investigators.2,27,28 Furthermore, we were able to capture the relationship among IQ or DQ group, degree of atypicality, and age by a mixed-effects model.
The results of the current investigation provide empirical validation for our previously proposed 3-dimensional model of ASD16,17 (Fig 2). This model averts the diagnostic dilemma of ambiguous boundaries among entities such as “high functioning autism,” “mild PDD with normal intelligence,” and Asperger syndrome29–31 by mapping each child onto a 3-dimensional diagnostic “space,” irrespective of such labels. Our model also takes into account predictable changes in expression of atypicality over time and thus represents a step toward the creation of a developmental definition of ASD, rather than present yes/no classification schemes such as the DSM-IV. Our model may also be useful in epidemiologic and etiologic investigations of ASD, as a way of sorting by clinical course into subgroups children who may turn out to share common underlying biological properties. For example, children with fragile X syndrome are more likely to fall into the low DQ/IQ group, whereas children with other causes may fall into the high DQ/IQ group.
One limitation of these data is that the patients consisted of a clinic sample, rather than a population-based sample. Our sample was predominantly upper middle class, which may limit the generalizability of our observations. However, high socioeconomic status has generally been a strong predictor for good developmental outcome. This makes the limited progress of children in the low DQ/IQ group all the more striking, ie, they failed to show reduction in atypical symptoms despite the advantage of high socioeconomic status home environments.
A second limitation is our inability to adjust for possible treatment effects. We cannot determine how much improvement in the normal DQ/IQ group was attributable to treatment effects versus the natural history of ASD. Similarly, our data cannot establish whether the absence of a decline in CARS over time in the low DQ/IQ group reflects inadequate treatment or represents the biologically determined outcome for children with ASD plus mental retardation. Even after intensive chart review, it was not possible to make any meaningful comparisons on the basis of treatment. The medical record with respect to educational intervention frequently contained gaps. Conversely, many patients received >1 type of treatment over the course of the study, receiving these treatments either simultaneously or sequentially, making it impossible to sort children into mutually exclusive treatment groups.
A third limitation is the absence of a universally agreed-on “gold standard” for quantifying the intensity of expression of atypicality. It is worth noting in passing that the creators of the CARS themselves observed decline in CARS scores with time16; whether this represented treatment effect or natural history remains unknown.
Finally, the determination of “intelligence” and the boundary between IQ and atypicality are problematic. Our decision to exclude hyperlexia and our decision to disregard selective deficits arising from ASD both were taken on practical grounds. Similarly, when scoring the CARS, we disregarded level of intelligence as well as we could, because we wished to tease apart these 2 dimensions of the child's development: when scoring item XIV, which asks the rater to score “intellectual level and consistency of response,” we focused on consistency of response (ie, presence or absence of splinter skills) rather than intellectual level. Similarly, on item XV, which asks for “general impression,” we limited ourselves to a general impression of the child's degree of atypicality, without regard to our impression of his or her level of intelligence.
Several of Kanner's original subjects showed dramatic improvement, without any specific therapy for ASD, suggesting that a portion of the decline in CARS scores among our patients represents the natural history of ASD for children with normal intelligence. Likewise, our mixed-effects model reveals age-related decline in CARS scores among patients with normal DQ/IQ even before their first clinic visit.
Our results account for 50% of the variability in CARS scores over time. Therefore, despite all of the limitations described above, our model seems to be tapping into something real. The remaining 50% in variability may be attributable to inaccuracies in initial DQ/IQ group assignment, limitations in the CARS as a measure of atypicality, unidentified biological factors, and/or treatment effects. Ultimately, the question of treatment effects (or lack thereof) versus natural history of ASD in the low and high DQ/IQ groups must await population-based, longitudinal studies, using standardized therapies, random assignment to treatment groups and objective, blinded assessment of response to treatment. If our model can be replicated within that context, then it will provide a useful tool for offering prognoses to parents of children with a new diagnosis of ASD. By providing a predicted outcome for children with ASD, our model could serve as a tool for gauging the putative efficacy of various intervention programs: it would no longer be sufficient to show that children who receive therapy X improve with time. Rather, it would become necessary to show that they improved more than would have been anticipated on the basis of our statistical model of the natural history of ASD alone. This is vitally important, because many currently popular therapies may be capitalizing on the natural history of ASD and claiming such improvement on their own behalf. Finally, our model may have utility in etiologic and epidemiologic studies of ASD, because children with different causes may map to different regions of this 3-dimensional model.
To our knowledge, these data represent the first attempt to construct a unified schema of ASD, taking into account IQ, degree of atypicality, and time. Given the limitations of our data set, this report should be regarded as a feasibility study rather than the final word on the subject. For researchers, our data suggest the utility of our 3-dimensional paradigm for ASD. We submit that future longitudinal studies of outcome in ASD incorporate this model and obtain serial measures of both IQ and expression of atypicality over time. For clinicians, the message in this article is that the future for the child with ASD contains hope: parents can be counseled that a certain degree of improvement in many children with ASD is inevitable and part of the natural history of the condition. Parents have found our 3-dimensional diagram to be immensely helpful as a way of coming to an understanding of the universe of individuals with ASD. Our model should not be used to predict outcome in any 1 child. However, our model can be used as a “map” on which to plot the progress of a child over time.
Dividing children with ASD into 2 mutually exclusive subgroups on the basis of intellectual function at the time of initial presentation (DQ or IQ <0.70 and DQ or IQ ≥0.70) reveals 2 distinct clinical patterns: children with initial DQ or IQ in the normal range show a statistically significant and highly predictable decrement in atypical features over time, as measured by their scores on the CARS. Conversely, children whose initial DQ or IQ is in the mental retardation range show no decrement in CARS scores over time. The present investigation does not permit us to differentiate between treatment effects (or lack thereof) versus biologically based differences between these 2 subgroups. This question can best be addressed through population-based, prospective studies of children with ASD. Our model, if validated within such a setting, has potential value as a clinical tool for prognostication and as an adjunct to etiologic and epidemiologic studies of ASD.
- Accepted October 25, 2004.
- Reprint requests to (J.C.) Neurodevelopmental Pediatrics of the Main Line, PC, Rosemont Business Campus, Building One, Suite 100, 919 Conestoga Rd, Rosemont, PA 19010. E-mail:
This work was presented in abstract form at the Centers for Disease Control and Prevention 2nd National Center on Birth Defects and Developmental Disabilities conference; July 25–26, 2004; Washington, DC.
No conflict of interest declared.
- Rutter M. Developmental issues and prognosis. In: Rutter M, Schopler E, eds. Autism: A Reappraisal of Concepts and Treatment. New York, NY: Plenum; 1978: 497–505
- ↵Kanner L. Autistic disturbances of affective contact. Nerv Child.1943;2 :217– 250
- Lotter V. Follow-up studies. In: Autism: A Reappraisal of Concepts and Treatment. Rutter M, Schopler E, eds. New York, NY: Plenum; 1978: 475–495
- ↵Coplan J. Counseling parents regarding prognosis in autistic spectrum disorder. Pediatrics.2000;105 (5). Available at: www.pediatrics.org/cgi/content/abstract/105/5/e65
- ↵Schopler E, Reichler RJ, Renner BR. The Childhood Autism Rating Scale. Los Angeles, CA: Western Psychological Services; 1988
- ↵Gesell A, Amatruda CS. Gesell and Amatruda's Developmental Diagnosis: The Evaluation and Management of Normal and Abnormal Neuropsychologic Development in Infancy and Early Childhood. 3rd ed. Hagerstown, MD: Harper & Row; 1975:xxv ,538
- ↵Delaney EA, Hopkins TF. The Stanford-Binet Intelligence Scale, Fourth Edition: Examiner's Handbook. Chicago, IL: Riverside Publishing Co; 1987:vi ,162
- ↵Wechsler D. WPPSI-R, Manual: Wechsler Preschool and Primary Scale of Intelligence, Revised. San Antonio, TX: Psychological Corporation; 1989:viii ,230
- ↵Wechsler D. WISC-III: Wechsler Intelligence Scale for Children: Manual. 3rd ed. San Antonio, TX: Psychological Corporation; 1991:xv ,294
- ↵Furuno S, Enrichment Project for Handicapped Infants. Hawaii Early Learning Profile (HELP): Activity Guide. Rev ed. Palo Alto, CA: VORT Corp; 1985:xiv ,190
- ↵Sparrow SS, Cicchetti DV, Balla DA. Vineland Adaptive Behavior Scales: Interview Edition, Expanded Form Manual. Circle Pines, MN: American Guidance Service; 1984:xiv ,321
- ↵Bruininks RH. SIB-R: Scales of Independent Behavior–Revised. Chicago, IL: Riverside Publishing Co; 1996
- Copyright © 2005 by the American Academy of Pediatrics