## Abstract

OBJECTIVE. We sought to develop a simple robust method for assessing the risk for sudden infant death syndrome (SIDS) on the basis of obstetric characteristics.

METHODS. A population-based retrospective cohort study was conducted of data from the linked Scottish Morbidity Record, Stillbirth and Infant Death Enquiry and General Registrar's Office database of births and deaths, encompassing births in Scotland between 1992 and 2001. All women who had a singleton live birth between 24 and 43 weeks' gestation and for whom data were available (*n* = 505011), divided into model development and validation samples, were studied. The main outcome measure was death of the infant in the first year of life as a result of SIDS.

RESULTS. The risk for SIDS was modeled in the development sample using logistic regression with the following predictors: maternal age, parity, marital status, smoking, and the birth weight and the gender of the infant. When the model was evaluated in the validation sample, the area under the receiver operating characteristic curve was 0.84 and the incidence of SIDS was 0.7 per 10000 (95% confidence interval: 0.3–1.4) among 126253 women in the lower 50% of predicted risk and 29.7 per 10000 (95% confidence interval: 23.4–37.2) among the 25250 women in the top 10% of predicted risk. A logistic-regression model then was developed for the whole population, and the output was converted into adjusted likelihood ratios. These are tabulated and provide a simple method for assessing the risk for SIDS associated with any combination of obstetric characteristics.

CONCLUSIONS. A model that uses maternal characteristics and outcome at birth is predictive of the risk for SIDS. This model is presented in a simple form that allows calculation of the individual risk for SIDS.

The factors that determine the risk for sudden infant death syndrome (SIDS) have been the focus of studies for many years.^{1} Identification of modifiable environmental exposures led to the “Back to Sleep” campaign and a dramatic fall in the incidence of SIDS. Despite this, SIDS remains the most common cause of death in infancy.^{2,3} After an apparent SIDS death, there should be an analysis of all of the factors that may have contributed to the event. The procedures for this have been reviewed recently^{4,5} and include detailed investigation of the scene of death and a thorough autopsy. Previous risks for SIDS are also taken into account in this process, including an assessment of whether there were any obstetric risk factors for SIDS. Many studies have addressed both prenatal and postnatal predictors of the risk for SIDS.^{1} However, these analyses are presented in a manner that does not allow easy and accurate assessment of the absolute risk associated with a given combination of characteristics. Our aim was to (1) develop a valid model that relates the risk for SIDS accurately to obstetric characteristics and (2) present it in a format that is simple to understand and use.

## METHODS

The study design was a retrospective cohort study of all singleton live births in Scotland who were between 24 and 43 weeks' gestation (inclusive) and documented in the Scottish Morbidity Record (SMR2) between 1992 and 2001. The outcome was death as a result of SIDS in the first year of life, ascertained through death certificate data from the General Registrar's Office (GRO).

### Data Sources and Patient Selection

The SMR2 collects information on clinical and demographic characteristics and outcomes for all women who are discharged from Scottish maternity hospitals. The register is subjected to regular quality assurance checks and has been >99% complete since the late 1970s.^{6} The Scottish Stillbirth and Infant Death Enquiry is a national register that routinely classifies all perinatal deaths in Scotland.^{3} The GRO maintains computerized birth and death registration records. A probability-based matching approach^{7} that used maternal identifiers to link the SMR2, the Scottish Stillbirth and Infant Death Enquiry, and the GRO database of birth certificates was used. The birth certificate contained offspring identifiers that then were used to link the pregnancy and perinatal death data to the death certificate register to identify deaths in infancy.

### Definitions

SIDS was defined as death of an infant in whom the principal cause on the GRO death certificate was coded as 798.0 using the *International Classification of Diseases, Ninth Revision* or R95 using *International Classification of Diseases, 10th Revision*. During the period studied, a diagnosis of SIDS could be written on a death certificate in Scotland only after thorough investigation of the circumstances of the death. The minimum requirements are described by the Crown Office,^{8} and an autopsy was mandatory. In practice, the investigation of these deaths was frequently much more involved.^{9} A previous detailed study of deaths attributed to SIDS on Scottish death certificates between 1992 and 1995 found that standard diagnostic criteria were fulfilled in all cases.^{10} Maternal age was defined as the age of the mother at the time of delivery. Smoking and marital status were defined as the status of the woman at the time of first attendance for prenatal care. Parity was defined as the total number of previous births, excluding abortions. Gestational age at birth was defined as completed weeks of gestation at the time of delivery. Gestational age has been confirmed by ultrasound scan in the first half of pregnancy in >95% of pregnancies in the United Kingdom from the early 1990s.^{11}

### Statistical Analysis

Univariate comparisons were performed using the Mann-Whitney *U* test, the χ^{2} test, and the χ^{2} test for trend, as appropriate. The *P* values for all hypothesis tests were 2-sided. Crude and adjusted odds ratios (ORs) were obtained by using logistic-regression analyses.^{12} Parity, maternal age, and the infant's birth weight all were treated continuously in logistic-regression models. Treating these variables in this manner avoids loss of information as a result of categorization. We excluded cases with extremes of birth weight (<500 or >5000 g) to avoid overly influential effects of outliers. This improves the reliability of modeling of birth weight for the vast majority of the population. Because very small numbers of cases had values outside this range, estimates of probability for these extreme cases would be potentially unstable. Nonlinearity in the log odds scale was tested and modeled using fractional polynomials. Regression techniques used robust standard errors to allow for dependence of different births to the same mother using a maternal identifier. The statistical significance of interaction terms was assessed using the Wald test, and significance was assumed at *P* < .01. Observations with missing values were excluded. The population was randomly assigned to a model development and a model validation sample. Goodness of fit was assessed using the Hosmer-Lemeshow test based on deciles of probability. The discrimination of the model was assessed by the area under the receiver operating characteristic (ROC) curve. The final logistic-regression model fitted to the entire cohort was converted into adjusted likelihood ratios using a modification of our recently described method^{13} (see Appendix for details). All statistical analyses were performed using the Stata 8.2 software package (Stata Corp, College Station, TX).

## RESULTS

There were a total of 563719 linked records of singleton births. Among these, there were 2955 (0.5%) stillbirths, 1043 (0.2%) births outside the range of 24 to 43 weeks' gestation, 270 (0.05%) births for which the weight was <500 g, and 1103 (0.2%) for which the weight was >5000 g. A total of 5099 (0.9%) had 1 or both of these characteristics, leaving 558620 live-born infants who weighed between 500 and 5000 g and were delivered between 24 and 43 weeks' gestation. Among this group, there were 53372 (9.6%) cases with missing data on smoking status, 263 (0.05%) with missing data on parity, 19 (<0.01%) with missing data on gender, and 15 (<0.01%) with missing data on maternal age. A total of 53609 (9.6%) records had 1 or missing variables, leaving a study group of 505011. The characteristics of the study group are tabulated in relation to whether the infant ultimately died from SIDS (Table 1). There were 317 SIDS deaths, giving an incidence of 6.3 (95% confidence interval [CI]: 5.6–7.0) per 10000.

Univariate and multivariate logistic-regression analyses are tabulated for the model development group (*n* = 252506; Table 2). There were significant associations between all of the factors studied and the risk for SIDS with the exception of being an ex-smoker. The relationships between the risk for SIDS and parity, maternal age, and birth weight were linear (in the log odds scale) in both univariate analysis and multivariate analysis. There were no statistically significant interactions between any of the variables.

The model then was used to assess the risk for SIDS in relation to the same characteristics in the validation group (*n* = 252505, ie, out of sample). The area under the ROC curve was 0.84 when tested out of sample. The observed and predicted number of SIDS cases is plotted versus deciles of predicted probability (Fig 1). There were 9 SIDS deaths among the 126253 women in the lower half of predicted risk, an incidence of 0.7 cases per 10000 (95% CI: 0.3–1.4). There were 75 SIDS events among the 25245 women in the top decile of predicted risk, an incidence of 29.7 cases per 10000 (95% CI: 23.4–37.2). The model then was fitted for the whole population of 505011. The area under the ROC curve for the model was 0.81 and the goodness of fit was adequate (*P* = .49). The fitted model then was converted into adjusted likelihood ratios (Table 3). The calculation of the risk for SIDS associated with any combination of characteristics is illustrated in Fig 2. Overall, 12387 (2.4%) cases had a summary likelihood ratio >5. There were 55 SIDS cases among this group, giving an incidence of 44.4 per 10000 (95% CI: 33.5–57.8).

Because >99% of missing data were on smoking status, we fitted a model for the 53372 women with missing data on this characteristic. The constant and coefficients were very similar to the other women (data not shown).

## DISCUSSION

The investigation of a sudden infant death requires a detailed analysis of all of the factors that may have contributed to the event, and the procedures for this have been reviewed recently.^{4,5} This includes detailed investigation of the scene of death and thorough postmortem investigations. It is recognized, however, that a number of characteristics of the pregnancy contribute to the risk for SIDS.^{1} It is self-evident that a thorough examination of the likely cause of death involves an assessment of the previous risk relating to the outcome of the pregnancy. Many previous studies have developed statistical models to characterize obstetric predictors of SIDS. However, none of these is presented in a way that allows the simple and intuitive assessment of the absolute risk for this event associated with a given combination of birth characteristics. The aim of the present study was to develop a method that was valid, had discriminative power, and was simple to use.

We developed a logistic-regression model and related the risk for SIDS to marital and smoking status, maternal age and parity, and the birth weight and the gender of the infant. Birth weight is a reflection of both fetal growth and gestational age at birth. We previously demonstrated log linear relationships between the risk for SIDS and both week of gestation of birth and birth weight percentile.^{14} In the present study, we used birth weight. Performance of the model was virtually identical to a model using week of gestation at birth and birth weight percentile (data not shown). Birth weight has the advantage of being less dependent on definition than gestational age and is more likely to be known than the exact birth weight percentile. We assessed the calibration and discrimination of the model in the validation sample. This demonstrated that the model fit the out-of-sample data well and had good discriminative power. A previous study had performed out-of-sample validation of 4 risk scoring systems for SIDS and found sensitivities of 41%, 53%, 62%, and 71% when the top 20% of predicted risk were classified as high risk; the best performing model included 17 predictors.^{15} In our own study using a model that had just 6 predictors, 72% of cases in the validation sample were in the top 20% of predicted risk.

A number of studies have shown that women with a previous SIDS event have an approximately fivefold risk for recurrence compared with the general population.^{16–19} In the United Kingdom, these women are offered a structured scheme for the care of the next infant, which involves symptom diaries, apnea monitors, scales, and weekly home visits by the family health visitor.^{20} Logically, the 2.4% of women with a summary likelihood ratio of ≥5 on the basis of the model might be offered a similar intervention, although this would require additional evaluation of efficacy and economic justification. However, application of this model assumes that the relationships between the variables studied and the risk for SIDS are similar in other populations.

The absolute risk for an outcome associated with a given combination of characteristics can be estimated from a logistic-regression equation using the constant and the coefficients. The constant reflects the baseline risk, and the sum of the coefficients reflects the modification of the baseline risk associated with the given combination of characteristics. However, typically, medical publications do not report the constant; therefore, this calculation cannot be performed. Moreover, even if provided with the constant and the coefficients, only a tiny minority of doctors would have the knowledge to perform this calculation. We sought to simplify estimation of the absolute risk from a logistic-regression model by expressing the output as adjusted likelihood ratios rather than as ORs. In fact, a likelihood ratio is merely a special type of OR. Taking the example of expressing the risk for a given outcome among male individuals, the OR associated with being male is the odds of the disease in male individuals divided by the odds of the disease among female individuals. The likelihood ratio associated with being male is the odds of the disease in male individuals divided by the odds of the disease in the whole population. Therefore, the OR expresses the risk relative to another category of the given characteristic (eg, male relative to female), whereas the likelihood ratio expresses the risk relative to the whole population. Using the example of gender, 2 likelihood ratios are generated: 1 expresses the risk for male individuals, and 1 expresses the risk for female individuals.

Estimating the absolute risk of a given event associated with any combination of characteristics is relatively simple using adjusted likelihood ratios (see Fig 2). The prior risk of disease is the odds in the population. The risk associated with any combination of variables is calculated by multiplying the prior risk by the appropriate likelihood ratios (Table 3). Therefore, estimating the absolute risk requires relatively little statistical expertise. Because the output of the model is in the form of an individual estimated probability, our approach avoids the loss of information involved in dichotomizing infants as “high” or “low” risk on the basis of an arbitrary threshold on an abstract numerical scale. Informing parents that their infant is at high risk of SIDS may cause unjustified anxiety, because the risk may be small in absolute terms. The likelihood ratio based approach has the key advantage that the output of the model is expressed in terms of the absolute risk associated with the given individual's combination of characteristics.

Expressing logistic-regression models in the form of adjusted likelihood ratios has several other advantages. First, if a predictor variable is unknown, then it may simply be ignored: omitting a variable in a likelihood ratio–based model makes the plausible assumption that the individual has the background risk for the population in relation to the given characteristic. Second, the use of adjusted likelihood ratios removes the need to select a reference category. In contrast, in logistic-regression analysis, a category of risk has to be regarded as referent. By choosing an extreme category as referent, ORs for all of the other categories will tend to be farther from unity. Therefore, the OR for a given characteristic may reflect the deviation in risk from the rest of the population in the referent category as well as the category in question. In contrast, by expressing the output of logistic-regression models as likelihood ratios, the odds of disease associated with any given feature are expressed relative to the odds in the whole population. Finally, because the model uses the previous odds as the starting point, there is the potential for using the adjusted likelihood ratios in other populations in which the incidence of the disease is higher or lower and accounting for this by using the local incidence to estimate the previous odds. This should be done carefully, however, as it assumes that variation in the incidence between populations does not depend on variation in the prevalence of the risk factors included in the model.

Other multivariate methods can be used to generate adjusted likelihood ratios, such as distribution modeling, which is used widely in Down syndrome screening.^{21} However, these methods do not directly incorporate binary variables, such as gender. Moreover, logistic-regression modeling is much more widely used in the analysis of risk, and many model-building tools have been developed for this method. A previous attempt was made to express logistic regression in the form of likelihood ratios.^{22} However, the previous method of calculation does not agree with the multivariate logistic-regression output if the model contains categorical variables with >2 levels or if the transformation of a continuous variable changed between the univariate and the multivariate model.

## CONCLUSIONS

We present a novel method for estimating the risk for SIDS in relation to a given combination of maternal and obstetric characteristics. This is simple to use and gives arithmetically identical results to much more complex statistical models.

## APPENDIX: ESTIMATING LIKELIHOOD RATIOS

The logistic-regression model is log(odds) = *a* + *b*_{1}*x*_{1} + *b*_{2}*x*_{2} +…+ *b _{n}*

*x*. The adjusted likelihood ratios are calculated as multiples of exp(

_{n}*b*

_{1}

*x*

_{1}), exp(

*b*

_{2}

*x*

_{2}), etc, in 2 stages.

In the first stage, we fit the above model with the term *b*_{1}*x*_{1} replaced with an unknown constant *d*_{1} and with all other terms (including the constant) fixed at their previous estimated values. The estimated value of *d*_{1} captures the risk before *x*_{1} is known, so the likelihood ratio is exp(*b*_{1}*x*_{1} − *d*_{1}). This is repeated for each term *b*_{2}*x*_{2}, …, *b _{n}*

*x*, and the constant is replaced by

_{n}*a*′ = (

*a*+

*d*

_{1}+

*d*

_{2}+…+

*d*).

_{n}Because of the nonlinearity of the log odds function, a′ may not exactly equal the overall log odds of the outcome, *a*_{0}, if the *x* variables are correlated. In the second stage, we therefore compute a correction factor *c _{j}*(

*a*

_{0}−

*a*′), where

*c*

_{1}+…+

*c*= 1, and we report likelihood ratios exp[

_{n}*b*

_{j}*x*−

_{j}*d*+

_{j}*c*(

_{j}*a*

_{0}−

*a*′)]. In this article,

*c*is calculated as

_{j}*m*/(

_{j}*m*

_{1}+…+

*m*), where

_{n}*m*is the sample minimum or maximum (depending on whether

_{j}*a*

_{0}−

*a*′ is positive or negative) of

*b*

_{j}*x*−

_{j}*d*: this procedure ensures that the range of adjusted likelihood ratios spans 1.

_{j}## Acknowledgments

This study was funded by a project grant from the Foundation for the Study of Infant Deaths (United Kingdom).

## Footnotes

- Accepted March 15, 2005.
- Address correspondence to Gordon C.S. Smith, MD, PhD, Department of Obstetrics and Gynaecology, Cambridge University, Rosie Maternity Hospital, Robinson Way, Cambridge CB2 2QQ, United Kingdom. E-mail: gcss2{at}cam.ac.uk
The authors have indicated they have no financial relationships relevant to this article to disclose.

## REFERENCES

- Copyright © 2006 by the American Academy of Pediatrics