January 2005, VOLUME115 /ISSUE 1

Children With Congenital Hypothyroidism and Their Siblings: Do They Really Differ?

  1. Joanne F. Rovet, PhD
  1. From the Departments of Pediatrics and Psychology, Brain and Behavior Program, Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada


Objective. Although favorable outcome is typically described in follow-up studies of children with congenital hypothyroidism (CH) identified by newborn screening, IQ reductions and persistent cognitive deficits are still reported. These findings are accounted for by disease and treatment variables as well as methodologic factors including choice of comparison group. Although siblings are ideal because they control for genetic and environmental influences, by definition they have different ages when tested, which can also introduce bias. Because we followed children with CH and their siblings over an extended period of time, there were a number of occasions when both groups were tested at the same age. The purpose of this study was to compare the results of children with CH and their unaffected siblings at the same age and with the same test.

Methods. The sample consisted of 42 children with CH detected between 1975 and 1985 and their 42 siblings, all of whom were tested with the McCarthy or Wechsler Intelligence Scale for Children—Revised (WISC-R) intelligence tests. Nineteen pairs of children were evaluated at 6 years with the McCarthy, and 30 pairs of children were evaluated at 7 or 9 years with the WISC-R. Recorded for children with CH were disease etiology, bone age and thyroxine levels at diagnosis, age at onset of treatment, and starting dosage of levothyroxine.

Results. Paired t tests revealed that the CH group scored lower than siblings by 8.1 IQ points on the McCarthy and 6.2 points on the WISC-R. Factors contributing to the size of the CH-sibling IQ difference were (1) the etiology of hypothyroidism, reflecting the larger differences by those with athyreosis or an ectopic gland than dyshormonogenesis, and (2) the starting dosage of levothyroxine, with those initially treated with ≥8.2 μg/kg per day having smaller CH-sibling differences than those given lower starting doses. There were no effects of bone age, thyroxine levels at diagnosis, or age at treatment onset.

Conclusion. Children with CH treated early in life due to newborn screening may have reduced IQ relative to siblings.

  • congenital hypothyroidism
  • intelligence
  • outcome
  • L-thyroxine
  • management

It is now >2 decades since newborn screening programs for congenital hypothyroidism (CH) were first implemented.13 A large number of studies on children so identified4 have revealed that although screening and early treatment are beneficial in preventing the mental retardation associated previously with cretinism,5 some children still experience subtle selective cognitive deficits69 that persist into adolescence10 and adulthood.11 Their specific deficits depend on disease and treatment-related factors including etiology of hypothyroidism12 and starting dosage of replacement hormone therapy.1316 Etiology is relevant because it reflects if and when the disease began in utero.17 Although the fetal thyroid is supplemented by the maternal thyroid, it does not totally compensate all late gestational fetal needs.18 Treatment factors are relevant because they determine the period of postnatal hypothyroidism. Although prompt treatment with a high initial dose level of levothyroxine (L-T4) is associated with favorable outcome,19 mild adverse effects may occur if starting doses are too high.16 Indeed, it is not decided what constitutes the best initial dose level, which still remains an issue of considerable discussion, if not controversy.2022

Studies examining intellectual outcome in children with CH identified by newborn screening describe an average lowering of ∼6 to 7 IQ points,23 with values varying according to a number of methodologic factors. Such factors include the particular sample characteristics; treatment and management factors including starting dose and the particular age at which children were treated; test age; and the IQ test that was used and when it was given in relation to when test norms were developed. The latter issue reflects a phenomenon known as the Flynn effect, whereby the population IQ seems to rise by one third of a point annually.24 In addition, variability among follow-up studies may also reflect the particular control group to which the children with CH were compared. Such groups include siblings,10,11 classmates,25 friends,26 unrelated typically developing children,6,9 and population or national norms.27 Although siblings are ideal because they control for genetic influences and environmental factors,28 by definition they differ in age from the child with CH. As such, this difference may necessitate the use of different test instruments or different items within the same test, and as a consequence, this too may skew the results. For example, test items may make more demands on an ability that is particularly vulnerable to a lack of thyroid hormone at one age than another.

Starting in the early 1980s, we began what was to become one of the longest running follow-up studies with the most frequent set of evaluations on any single cohort of children with CH identified by newborn screening. Our sample was evaluated annually throughout childhood29 and into adolescence10 by using a wide variety of age-specific tests. Because siblings were followed similarly, although not as frequently, there were a number of instances in which a sibling and a child with CH were tested at the same age with the identical test instruments. In this article, we describe the direct comparisons between a convenience sample of children with CH and their siblings who were tested at the same age. Additionally, we examined how specific disease and treatment factors affect the size of the CH-sibling differences.



Our initial cohort consisted of 106 children with CH who were born in the Toronto, Canada, area between 1975 and 1985. These children had been positively identified by newborn screening and were diagnosed at the Hospital for Sick Children (HSC), which was designated as the primary diagnostic and treatment center for CH in the catchment area around and including Toronto. This region involved ∼40% of births in the province of Ontario. The first few cases born between 1976 and 1978 were identified through an experimental cord blood program at Mount Sinai Hospital,30 and the remaining children born after November 1978 were identified through the provincial program.31 Except for the earliest few cases, all children were identified on the basis of an elevated thyrotropin value obtained from dried heel-prick blood samples spotted onto filter paper cards. Shortly after notification of the abnormal newborn spot, confirmatory testing was conducted at HSC, where thyroid function tests, a knee radiograph to determine bone age, and technetium scanning to determine etiology were performed.

Families were subsequently invited to participate in the follow-up study either in person during the child's HSC endocrine clinic visit or by letter. At the initial session, all families gave informed consent for children with CH and their siblings to participate and, at subsequent assessments, for new siblings to be involved. The HSC Research Ethics Board formally approved all procedures on an ongoing basis.

Any child with a comorbid illness or other congenital abnormality or who was born prematurely was excluded. All children with CH were assessed around the time of a forthcoming birthday, which is when they were seen in the endocrine clinic. The majority of the children were first seen at 12 months of age, and the others were seen between 2 and 6 years old depending on when they were born in relation to the present project. Siblings were similarly recruited at the initial interview and usually tested on the same day as the child with CH. Initially, most siblings were older than the child with CH; however, any sibling born after the child with CH had entered the project was also recruited into the study. Although 4 families originally each contributed 2 children with CH to the project (2 twin sets), none of these families provided a non-CH sibling control and thus were not included in the present study. Similarly, although several families provided 2 siblings originally, only 1 CH-sibling match per family was considered presently. From the original cohort of 106 children with CH and their 74 siblings, 86 with CH were available for follow-up testing at 6 to 9 years, representing a 19% attrition rate over a period of up to 6 years. Of these 86, 42 had a sibling who was tested at the same age and with the same test. The remaining 32 siblings were tested with another instrument outside the current age range or the family was no longer participating in the study. At age 8, IQ was not assessed. There were no differences in IQ between the children with CH who had a participating sibling versus those who did not.

Tests and Measures

Children at 6 years of age were assessed with the McCarthy Scales of Children's Abilities32 and at 7 and 9 years with the Wechsler Intelligence Scale for Children–Revised (WISC-R).33 The main measure for the McCarthy was the general cognitive index (mean = 100; SD = 15) and for the WISC-R was the full-scale IQ (mean = 100; SD = 15). Although only these global indices will be reported presently, refer to other articles by me for results on specific subtests as well as other tests given at these ages.

For children with CH, the following biomedical information was also collected: etiology of hypothyroidism based on technetium scanning (athyreosis, dyshormonogenesis, ectopia), bone age at diagnosis based on knee radiographs in gestational weeks,34 confirmatory thyroxine (T4), age at start of treatment, and starting dose in micrograms per kilogram. For both groups, gender was recorded.

Data Analysis

Data were analyzed with SPSS-11 for Macintosh (SPSS Inc, Chicago, IL). For both McCarthy and WISC-R data, comparisons between matched CH-sibling pairs were conducted by using paired t tests for continuous data and repeated-measures analysis of variance. Within-group biomedical variables were analyzed using correlations, t tests, and analysis of variance. The latter 2 procedures were conducted among CH subgroups formed on the basis of either a median split (bone age, test age) or tertile levels (starting dosage).


Of the 42 CH-sibling pairs tested at the same age with either the McCarthy or WISC-R instruments, 10 pairs received both the McCarthy at age 6 and the WISC-R at either 7 or 9 years of age, and 1 pair was tested at all 3 ages. Table 1, which shows the distribution of pairs by gender and etiology of hypothyroidism, indicates that these sample characteristics are consistent with those in the published literature.


Sample Distribution by Gender and Etiology of Hypothyroidism

Table 2 shows the McCarthy and WISC-R scores of the CH and sibling groups. The results for the CH-sibling pair that received the WISC-R at both 7 and 9 years of age were averaged across the 2 sessions. A comparison of the 2 test instruments revealed no differences in IQ for the McCarthy versus the WISC-R. Matched-pairs comparisons revealed significant differences on the McCarthy (P < .05) and WISC-R (P < .01), reflecting the lower scores by the children with CH than their siblings. The mean CH-sibling differences were 8.1 points (favoring siblings) for the McCarthy and 6.8 points for the WISC-R.


Mean (SD) IQ Scores by Test

Examination of the results according to the CH child's etiology of hypothyroidism revealed that siblings outperformed children with athyreosis or ectopic glands but not children with dyshormonogenesis. The CH-sibling IQ differences were −14.6, −5.0, and −7.4 (McCarthy) and −6.0, 1.4, and −9.1 (WISC-R) for athyreotic, dyshormonogenic, and ectopic groups, respectively. Note that children with dyshormonogenesis scored below siblings on the McCarthy but above them on the WISC-R.

Scores were combined for both sets of test results, and if a child was tested with both tests, the WISC-R result was preferentially used. We felt justified in this approach because results from both tests are highly correlated,33 and the WISC-R was given at an older age when abilities would have become more solidified. These results were analyzed by using a mixed-model repeated-measures analysis of variance, with etiology as the between-groups factor and CH or sibling as the repeated measure. Results revealed a significant effect for group (F[1,39] = 5.02; P = .03) but not etiology or the group X etiology interaction (P = .36 and .27, respectively). Figure 1 presents the mean IQ results for CH and siblings according to the etiologic grouping of the child with CH. Figure 2 shows the results for individual pairs of children according to the child with CH's etiology of hypothyroidism. Also indicated in Fig 2 by the symbols connecting pairs are the ages at which particular sets of children were tested. Visual examination of these symbols does not reveal any systematic effect of age at testing among matched pairs. Note also in Fig 2 that only 1 CH case (2.4% of the sample) had an IQ scored in the mentally defective range.

Fig 1.

Mean IQ scores of CH (black bars) and siblings (striped bars) by etiology.

Fig 2.

Scores for individual CH children and their sibling according to etiology of hypothyroidism and test age. Diamonds indicate 6-year assessment; filled circles, 7-year assessment; and open circles, 9-year assessment.

Bone-age measurements were available for 33 of the 42 children. A median split procedure was used to stratify children into those with bone ages of ≤36 weeks (52% of the sample) and those with bone ages from 37 to 40 weeks' gestation (48% of the sample). The bone-age subgroups did not differ in IQ (97.5 vs 105.9; P = .12) or the size of the CH-sibling IQ difference (−5.71 vs −2.81; P = .56). Pearson-product moment correlations computed between bone age and the individual IQ score or the CH-sibling difference score were not significant (r[31] = 0.16 and 0.08; P > .05).

Similar analyses for T4 levels at diagnosis revealed no significant correlations with the CH child's IQ (r[40] = −0.07; P > .05) or the CH-sibling IQ difference (r[40] = −0.13; P > .05). A median split of the initial T4 levels showed that those with values <45 nmol/L had similar IQ levels and difference scores with siblings as did those with T4 levels >45 nmol/L (102.2 vs 103.1 [P = .85] and −5.05 vs −8.21 [P = .48], respectively). Likewise, there were no differences between early-treated (≤12 days) and late-treated (≥13 days) children in IQ (100.5 vs 105.1; P = .32) or the size of the CH-sibling IQ difference (−6.0 vs −7.3; P = .77). Correlations between these IQ measures and treatment age also were not significant (r[40] = −0.07 and −0.08; P > .05).

In our sample, the starting dosage ranged from 3.2 to 12.3 μg/kg per day, with 3 of the children being started on pill sizes of 12.5 μg and the remaining at either 25 or 37.5 μg equally. To stratify the dose levels into realistic categories, we used tertiles to form a low-dose group with values <6.0 μg/kg, a medium-dose group with values from 6.2 to 7.8 μg/kg, and a high-dose group with values from 8.2 to 12.3 μg/kg per day. Only 3 children had doses >10.0 μg/kg per day. Although analysis of variance revealed that the dose groups did not differ significantly in mean IQ or the size of the CH-sibling difference, there was a definite tendency for IQ to increase with dose tertile (95.9, 102.5, and 109.3, respectively; P = .13) and for the CH-sibling difference to decrease (−11.8, −6.9, and −1.5, respectively; P = .25). A Tukey test revealed that the difference between the low- and high-dosage groups was significantly different (P < .05). The correlation between IQ and dose level was not significant (r[40] = 0.239; P > .05).


The primary objective of the present study was to compare children with CH diagnosed by newborn screening to their unaffected siblings at the same age and with the same test instrument. The utility of sibling-pair matching as a means of controlling for major genetic differences and differences in the home environment was used recently in a study of preschool extremely low birth weight infants to show that effects of preterm status predominate over socioeconomic status.28 The present results indicate that the CH group attained significantly lower IQ levels than their siblings, with the difference between groups ranging from 6 to 8 points depending on the particular test used. There was only 1 child with CH (2.4% of sample) who had an IQ score within the mentally defective range, indicating that screening is indeed successful in eliminating mental retardation. However, IQs were not fully restored to normal in the CH group, if the sibling results are a benchmark or approximation of what would have been the IQ level the child with CH would have attained had he or she not had CH. Although the values of the CH group are above the mean for the test, it should be noted that these results are inflated, given the Flynn effect,24 and the testing took place a number of years after the standardizations for both tests. Furthermore, it should be noted also that the IQ distribution of Canadian children differs from American children with a mean of ∼3 points higher and a more restricted SD range (ie, the proportion of cases at upper and lower ends is less in the Canadian).35

When results were compared according to disease and treatment parameters, differences were accounted for mainly by etiology and dosage, whereas there were no effects of age when treatment was started or hormone levels at confirmatory diagnosis, contrary to a British study that showed that hormone levels at time of diagnosis were the strongest predictor of IQ in children with CH.36 Although the finding of poorer outcome in children with athyreosis than controls is consistent with past research, the present observation that children with ectopic glands were also affected is discrepant with previous research and may reflect this particular sample. Alternatively, it may reflect the increased sensitivity of directly matching the CH child with his or her sibling. Also observed presently was a slight effect of bone age at diagnosis. This index, which serves as a marker of fetal hypothyroidism,25 indicates a modest effect of intrauterine thyroid-hormone insufficiency on overall IQ in childhood.

Although this study is unique in comparing a large sample of children with CH to their siblings at the exact same age, it is limited for several reasons. First, it is based on management procedures used early in the history of newborn CH screening, whereby most children received a starting dose well below currently recommended guidelines.37,38 As well, only 3 children in our sample received starting dosages >10 μg/kg per day, which is now the recommended dosage. Second, the sample size, although substantial for the overall analyses, was small for subgroup comparisons. The lack of difference between some subgroups may have represented a type 2 statistical error rather than a true lack of effect. Certainly more data are required on the etiologic subgroups to determine if they do differ from siblings, particularly because this information is important for parent expectations. Third, the sample is a convenience sample, representing only those families with both a child with CH and a control tested in the requisite age range. Hence, results may not be fully representative of the cohort as a whole. Fourth, results are based on IQ scores, which are global composites and may not be sensitive enough to detect the specific effects of a loss of thyroid hormone at particular times during development. Furthermore, the IQ does not measure certain abilities (eg, attention) that are especially sensitive to thyroid-hormone effects. Clearly, similar analyses of specific endpoints are warranted. Fifth, the tester who also partook in recruiting, scheduling, and database management was not masked to CH versus sibling-group status. Sixth, children with CH were usually tested shortly after their birthday, whereas siblings who accompanied the child with CH were tested any time during the year. Because results were converted to IQ scores by using standardization tables within 6-month age blocks, it is possible that this conferred biases in the groups' scores relative to the normative sample. In addition, the role of nondisease factors such as socioeconomic status and parent IQ, although obtained, were not examined in the present analyses. Similarly, other variables related to disease, such as time to normalization, adequacy of subsequent treatment, and later hormone levels, were not examined in this brief report.

Nevertheless, the results of the present study have important practical clinical implications for understanding outcome in particular children with CH. The present results show that the children will approach (but not fully reach) their destined IQ levels if they have a dysgenetic (eg, athyrotic, ectopic) gland. However, if they have dyshormonogenesis and are treated early and adequately, then they will, in fact, be unaffected. A treatment level of between 8 and 12 mg/kg per day will serve to minimize the effects in the other 2 groups. It is not clear, however, whether higher doses will fully restore all abilities to normal in that some children experience intrauterine hypothyroidism, and there is a time delay until treatment is given and normalization occurs. Because considerable thyroid-hormone–dependent brain development is occurring during these periods, selective cognitive deficits may still occur.


When children with CH treated early after newborn screening are directly compared with their own siblings, they do exhibit a mild IQ loss. However, this loss can be minimized to a certain degree if a higher starting dosage is provided. Although a high dosage is definitely beneficial for children with severe hypothyroidism at birth, it is still not known what harm may be associated with overtreating children with milder forms of CH (eg, dyshormonogenesis)39; additional investigation is required.22


This work was originally supported by grants from the Ontario Ministry of Health, Ontario Ministry of Community and Social Services, and the Ontario Mental Health Foundation.

I am extremely grateful to Robert Ehrlich for long-standing involvement in the follow-up project; Donna Sorbara for exceptional efforts in recruiting and maintaining the sample; and the children and their families for outstanding commitment to this research.


    • Accepted August 20, 2004.
  • Reprint requests to (J.F.R.) Department of Psychology, Hospital for Sick Children, 555 University Ave, Toronto, Ontario, Canada M5G 1X8. E-mail: joanne.rovet{at}
  • No conflict of interest declared.

CH, congenital hypothyroidismT4, thyroxineL-T4, levothyroxineHSC, Hospital for Sick ChildrenWISC-R, Wechsler Intelligence Scale for Children–Revised